upCast 8.0 Reference Manual

Christian Roth

C.R.

Revision History
Revision 1Fri, 16 Sep 2016 12:54:00 CEST

1. upCast Overview
1. What is it?
2. System requirements
2. upCast Architecture
1. The pipeline component
2. Pipeline views
3. The module component
4. Parameter Sets
5. Conversion phases
3. upCast UI
1. Running upCast in GUI mode
1.1. Windows
1.2. Mac OS X
1.3. Unix, Linux
1.4. From the commandline
2. The pipeline document window
3. File menu
3.1. New
3.2. New from Template
3.3. Open…
3.4. Open Recent
3.5. Close
3.6. Save
3.7. Save as…
3.8. Save to Parameter Set…
3.9. Export to Ant
3.10. Export to Java Source…
3.11. Export to XML…
3.12. Generate documentation…
4. Edit menu
4.1. Cut / Copy / Paste
4.2. Copy as UPL
4.3. Duplicate
5. Pipeline menu
5.1. Run
5.2. Simple View
6. Help menu
6.1. Information…
6.2. License Agreement
6.3. Logfile
6.4. upCast Documentation…
6.5. UPL Documentation
6.6. Javadoc API Documentation…
6.7. Send Feedback…
4. The variable system
1. Variable reference syntax
2. The global realm
3. The application realm
4. The environment realm
5. The pipeline realm
6. The module realm
7. The javaproperty realm
8. The include realm
9. Parameter and Variable Types
5. Application-level Settings
1. Application Preferences
1.1. Application Settings
1.2. Catalogs
1.3. Font Configuration
1.4. Encodings
1.5. License
6. Pipeline-level Settings
1. Typographical conventions
2. Special pre-defined pipeline variables
2.1. PipelineBase, base
2.2. PipelineURI
2.3. ParamBase
2.4. ParamURI
2.5. PipelineInstanceId
3. Pipeline Settings
3.1. Pipeline Parameters
3.2. Settings
3.3. Catalogs
3.4. Font Configuration
3.5. Encodings
3.6. License
3.7. Export
3.8. Documentation
7. Module-level settings
1. Typographical conventions
2. Module Options (common for all modules)
3. Action Settings (module-specific)
8. Modules
1. Pipeline Variables [pipelinevars]
2. RTF Importer (upCast) [rtfimport]
3. UPL Processor [uplcode]
4. UPL Tree-Processor [upl]
5. Sectioner [sectioner]
5.1. Handling of uci:part elements
5.2. Handling of other elements
6. [DEPRECATED] Grouper [grouper]
7. XML Importer [xmlimport]
8. XML Exporter [xmlexport]
9. Commandline Processor [commandline]
10. XSLT Processor [xslt]
11. Unicode Translation Processor [unicodetranslator]
12. XML Validator [validator]
13. CSS Exporter [css]
14. RTF Exporter ("downCast") [rtfexport]
15. External Pipeline Processor [extpipeline]
9. Parameter Sets
1. What parameter sets are
2. What parameter sets contain
3. How parameter sets work
4. Creating parameter sets
5. Variable: ${pipeline:ParamBase}
6. What happens when…
6.1. …the pipeline implementation’s number or type of parameters changes?
6.2. …I change a pipeline implementation while a depending parameter set is open?
6.3. …the Pipeline UID changes and parameter sets using the old id already exist?
6.4. …there is no mapping in the catalog system for a certain parameter set UID?
10. Grouping using Painters
1. The Painter concept
1.1. Tagging nodes and placing the painters
1.2. Painting the nodes
2. Node Tags
3. Painter Types
3.1. start-end
3.2. start*end
3.3. start-here
3.4. start*here
3.5. here-end
3.6. here*end
3.7. start-start
3.8. start*start
3.9. end-end
3.10. end*end
3.11. this
4. Grouping algorithm
5. Examples
5.1. Grouping by paragraph class
5.2. Grouping with a known start element
11. XML Namespaces in upCast
1. The upcast-internal namespace (uci)
2. The upcast-css namespace (css)
3. The upcast-cssoverride namespace (csso)
4. The upcast-cssclass namespace (cssc)
5. The upcast-cals namespace (cals)
6. The HTML namespace (html)
7. The XLink namespace (xlink)
8. The XML namespace (xml)
9. The Variable Realm namespaces
10. UPL Utility functions library namespace
12. Recognized Java system properties
13. Commandline Interface
1. How it works
2. Synopsis
3. Exit codes
3.1. upCast exit codes
3.2. Custom exit code pipeline:ModuleResult
14. Java API (low-level)
1. Concepts
2. Using the API
2.1. General programming steps
2.2. Setup
2.3. Building and running a pipeline
3. Connecting to WordLink (Windows only)
3.1. Accessing Word as COM object in a restricted environment
4. API Error handling
4.1. Coding pattern
4.2. Error handling tidbits
15. upcast-runner Ant task
1. Defining the upcast-runner task
2. Structure of the upcast-runner task
2.1. upcast-runner
2.2. source
2.3. licensefile
2.4. catalogs
2.5. catalog
2.6. param
16. upcast Ant task
1. Defining the upcast task
2. Structure of the upcast task
2.1. upcast
2.2. source
2.3. settings
2.4. licensefile
2.5. logging
2.6. catalogs
2.7. catalog
2.8. encodings
2.9. encoding
2.10. fontconfig
2.11. parameters
2.12. pipeline
2.13. module
2.14. param
17. Logging Architecture
1. Log Event processing
1.1. Global Log Event path
1.2. Component's Log Event path
1.3. Terminate Pipeline Execution Signal
2. Component filter defaults
3. Special situations
3.1. Exception in component initialization
4. Filter specification syntax
18. Pipeline Templates
19. Standard Folders and Locations
20. Unicode translation map
1. Syntax
2. Options
2.1. @charref
2.2. @fill
2.3. @invalid-xmlchar
2.4. @invalid-xmlchar-attr
21. CSS property unit table
1. Syntax
2. Options
2.1. @option-default-length-unit
2.2. @option-default-length-precision
22. Fonts and Encodings
1. Font Configuration
1.1. Properties and Values
1.2. Options
1.3. File structure
1.4. Matching Algorithm
1.5. Default font configuration
2. Custom Encodings
2.1. How it works
2.2. Associating a Font with an Encoding
2.3. File format
23. Troubleshooting
1. Finding out basic version info of an upcast.jar
2. Finding out extended environment info
3. Extended log info
24. Copyright, Licenses, Legal, Acknowledgements
1. Copyright, Licenses, Legal
1.1. upCast
1.2. Steadystate CSS2 parser
1.3. Xerces, Xalan
1.4. Apache Commons
1.5. swing-layout (org.jdesktop.*)
1.6. W3C
1.7. XML- and OASIS Catalog Support
1.8. Saxon 6.x
1.9. Saxon-B 9.x
1.10. MRJAdapter
1.11. Jaxen
1.12. Jing
1.13. Redstone XML-RPC Library
1.14. Apache Ant
1.15. LogBridge
1.16. JTimepiece
1.17. XMLUnit
1.18. RSyntaxTextArea
1.19. Apache POI
1.20. Simple 4.1.21
1.21. Flying Saucer
1.22. BrowserLauncher2
2. Acknowledgements and Thanks

Important Note

This document is intended as a technical reference manual to upCast RT (in the following called just upCast).

It is not intended as a tutorial on how to use upCast efficiently, best practices for creating pipelines or similar topics. These will be covered in separate tutorial-style documents, a How-To section on our website as well as a Frequently Asked Questions document. Please turn to these documents as they are published on our website http://www.upcast.de/ in the near future.

Note

This reference document describes upCast RT 8.0.

Chapter 1. upCast Overview

1. What is it?

upCast is a module-based document processing pipeline tool, specializing in legacy, "flat" and layout-driven content. It comes with pre-defined, configurable, task oriented modules (that perform operations like importing data, XSLT processing, serialization and validation etc.) that you can put into any order you wish to create a pipeline. Pipelines can be saved and parameterized as a whole and then be run either within upCast’s UI or from the commandline or directly from Java.

Pipelines can be set up to be fully relative in their file addressing and therefore can be shared without modifcations between computers, even different platforms.

2. System requirements

To run upCast, you must meet the following minimum requirements:

  • Java Runtime Environment 7.0 or later ("Java 7")

  • Xerces 2.11.x or later (upCast includes Xerces 2.11.0 and does not work in systems that have earlier versions than Xerces 2.9 in their classpath)

  • 1024 MB of Java heap available to upCast (depending on document size and pipeline configuration, actual memory requirements may be lower or higher)

  • Display resolution of at least 1280 x 1024 (when running the graphical development environment)

Chapter 2. upCast Architecture

1. The pipeline component

The highest-level component type in upCast usage is a document processing pipeline, or short: pipeline. Pipelines can be saved into documents (file extension .ucdoc) and recalled at any time. Complete pipelines can be exported into several formats, like a Java source file or source code for an Ant target.

2. Pipeline views

To the user, upCast presents its functionality in two layers: the so-called "Simple View" and the "Edit View". Think of the Simple View as a simplifying, user-oriented layer over the Edit View, which is developer-oriented and shows the actual, fine-grained and possibly complex implementation of the conversion pipeline.

Simple View as a user-oriented layer over the detailed Edit View on a pipeline’s implementation

3. The module component

Pipelines are made up of modules. Modules each perform a specific and specialized task. Modules can be divided into the three categories importers, processors and exporters based on the tasks they perform.

Importers import documents into the internal document format. upCast currently includes a high-quality RTF/Word importer.

Processors come in two variants for internal and external processing. Internal processors modify the current, internal document representation. This is carried out in-place. External processors can be used to perform general tasks which are not dependent on the internal document, like running a shell command.

Exporters are used to serialize the internal document or part thereof in one of several formats.

Within a pipeline, at any time during execution there’s exactly one internal document representation the tasks are performed on. This means that modifications are in most cases performed in-place, so changes made to the internal document tree by one module are visible to subsequent ones.

While a run of an importer always replaces the internal document, you can have several exporters that serialize the same internal document in different ways. You can also serialize the document at any point in the pipeline and apply additional modifications using processors afterwards.

4. Parameter Sets

It is often useful to be able to save, quickly recall and share with other users different parameter settings for running a certain, parameterized pipeline. Such a parameter set can be saved into documents (file extension .ucpar) and recalled at any time.

A parameter set document only contains the pipeline parameter values as they are set in the Simple View at the time of saving. Only parameters that have their persistent property set to true are saved in that document. It also contains the Pipeline UID of the pipeline document it is based on so that it can load its implementation for execution.

For details, see the section on Parameter Sets.

5. Conversion phases

Usually, a conversion is a three-phase process: You import the source data into the application, process the data, and export the result. Sometimes, a fourth, external post-processing step is added. upCast offers various modules, which can be divided into three different classes: Importers, Processors and Exporters.

Here’s a diagram of a typical upCast pipeline (with the internal document indicated over time):

upCast sample pipeline with corresponding internal document life-span; list of modules

Chapter 3. upCast UI

1. Running upCast in GUI mode

1.1. Windows

Download the Windows installer package and run the installer. It will create a customized Java launcher, create shortcuts and register appropriate file type associations. You then launch upCast by clicking the upCast RT application icon.

1.2. Mac OS X

Download the disk image file (.dmg), mount it by double-clicking and copy the upCast RT application to your Applications folder (or at any other place you wish). Start upCast by double-clicking the application icon.

1.3. Unix, Linux

Download the Unix installer and run it, then run the installed upCast launcher.

1.4. From the commandline

Download the upcast.jar file and launch it from the commandline using

java -Xmx1024m -jar upcast.jar

, which is short for

java -Xmx1024m -classpath upcast.jar de.infinityloop.upcast.AppUI

1.4.1. Commandline options

There are a few additional commandline options that are supported. Here's the synopsis:

java –jar upcast.jar parameters...

with parameters being either of the following variants of option sets:

Variant 1

[0..n]

absolute path(s) to the pipeline or parameter set document(s) to be opened initially; any files passed here will override the Re-open documents that were open when last quitting setting in the application's preferences

[n+1..n+m]

standard options

Standard options are as follows:

-catalog <file>+

one or more (XML-) catalog files to set up as global upCast catalogs before further processing. Setting this option is essential when wanting to initially open parameter sets (*.ucpar) that rely on resolving their PUBLIC identifier to find the corresponding pipeline implementation file.

Example 3.1. 

java -jar upcast.jar myfile1.ucdoc myfile2.ucpar -catalog catalog1 catalog2

starts upCast in GUI mode, initially opening myfile1.ucdoc and myfile2.ucpar, with the catalog system set up to use the catalog files catalog1 and catalog2.


Variant 2

-info

will display extensive version and execution environment information of the upCast application as present in the respective upcast.jar file

Example 3.2. 

java -jar upcast.jar -info

will print extensive version and environment information to the console.


Variant 3

-version

will display upCast version and build info

Example 3.3. 

java -jar upcast.jar -version

will print version number, build number and build date to the console.


Variant 4

-buildnumber

will print the upCast raw build number, followed by a newline character, to the console

Example 3.4. 

java -jar upcast.jar -buildnumber

will print the build number to stdout.


2. The pipeline document window

The upCast UI is designed to be simple and effective. An upCast document is a complete pipeline setup and can be can be saved in a file with the default extension .ucdoc. Each document is shown in its own window, and you can have several pipelines open at the same time.

A document window in edit mode is divided into three parts.

The left pane shows the sequence of modules that make up the pipeline. The position of a selected module in the pipeline can be changed by using the nudge-up/nudge-down controls at the bottom of the list. A module can be deleted from the pipeline with the "–" control, a module can be added by clicking the "+" control and choosing the desired class from the popup. There can be multiple instances of the same module type in a pipeline as required, e.g. two or more XSLT processors.

A module can have the following decorators:

The right pane shows the parameters for the currently selected module. Only one module can be selected at any time. Changes to a module’s parameters are effective immediately.

At the bottom of the window, the pipeline execution controls are placed for executing a pipeline, stopping it underway and checking its progress.

An upCast document window showing a pipeline setup with the XML Exporter selected

Note

This display is replaced by a dynamically generated, forms-like interface when the Simple View option is engaged.

3. File menu

3.1. New

This command creates a new, empty pipeline document.

3.2. New from Template

This command lets you create either a parameter set based on a factory-supplied template or a new, independent, self-contained pipeline configuration from one of the available templates.

create parameter set…

Creates a parameter set from the respective template’s main pipeline document.

Note

The advantage of just creating a parameter set is that if you do not need to tweak the implementation, but just use the pipeline template’s functionality as-is only with variable parameter values, you will benefit from updates and bugfixes to the template automatically without any further manual intervention required. This comes from the fact that the parameter set only holds a reference to the actual template implementation and therefore is automatically updated when the implementation is.

create independent copy…

Creates a full, physical copy of all the pipeline documents and resources the template is made up of. You are asked for the location (folder) and a base name for the new pipeline. Within the selected folder, a new folder by the specified name is created and any resources of the template, including the pipeline document, are copied into that folder.

Note

This pipeline created based on the chosen template is completely independent from its template. This means two things:

  • you get a complete, independent copy of the original template definition and resources

  • any updates to the template are not propagated forward to any pipelines you already have created based on an older version of the template

You can create your own, specific templates. For details on what makes a pipeline an upCast template and where to put those templates for upCast to recognize them, see the chapter on Pipeline Templates.

3.3. Open…

This shows a file chooser where you can open an already existing pipeline or parameter set document.

3.4. Open Recent

Shows in a sub-menu the most recent pipeline and parameter set documents you had open in the past. The number of items displayed in the sub menu can be set in upCast’s preferences.

Pipeline or parameter set documents you had open recently, but which are no longer available (for example because they have been deleted or the disk they reside on is currently not mounted) are shown in disabled state.

3.5. Close

Closes the top-most document. When changes to this document have not yet been saved, you are prompted to save them.

3.6. Save

Saves the top-most document, which can be a pipeline or parameter set document, the log window or the system information window.

3.7. Save as…

This allows you to save the top-most window under a new name.

Note

Note that for pipeline documents that refer relatively to needed resources, saving a pipeline document to a different location will usually break those links and the pipeline will not run as expected, since upCast cannot reliably track those resource links and copy them along automatically.

3.8. Save to Parameter Set…

This lets you save the persistent parameters and their values of the top-most pipeline document to a separate file, a parameter set document. This file internally links back to its pipeline document it was created from. This allows you to separately store configurations of parameter values that look like separate pipelines, but share one single pipeline implementation. When the latter gets updated, so do all parameter sets originating from it.

See the section on parameter sets for more information on how the linking to the respective pipeline document works and what the restrictions of parameter sets are.

3.9. Export to Ant

Saves the current pipeline document in form of an Ant task. Additional parameters can be set for the export operation in the Pipeline Settings dialog under the Export tab.

using ‘upcast-runner’ task

exports an Ant task making use of an upCast runner object, which reads the specified pipeline and executes it. This is the recommended export option since you need to generate that task only once and it picks up automatically any changes in the referenced pipeline document.

as self-contained Task

this creates a fully, self-contained Ant task of the current pipeline’s configuration. This means that the task can be run without having access to the original pipeline document it was generated from. This may be useful when you used the original pipeline document only for prototyping and testing, and want to apply changes directly to the Ant task’s definition thereafter, or can recreate the task automatically when making changes to the pipeline document (e.g. in an automated build using upCast’s Tools class).

3.10. Export to Java Source…

Saves the current pipeline document as Java Source code. Additional parameters can be set for the export operation in the Pipeline Settings dialog under the Export tab.

using RunPipeline class

exports Java source code making use of upCast’s RunPipeline class, which reads the specified pipeline and executes it. This is the recommended export option since you need to re-generate the source code for that class only when the pipeline parameter configuration changes (i.e., parameters are added or removed) and it picks up automatically any further changes in the referenced pipeline document.

as self-contained source

this creates a fully, self-contained Java class of the current pipeline’s configuration, utilizing the upCast Java API’s UpcastEngine class’s methods. This means that the code can be run without having access to the original pipeline document it was generated from. This may be useful when you need fine-grained control over error handling for each individual module’s execution step and/or need to dynamically execute additional code that cannot be integrated into a standard pipeline execution.

3.11. Export to XML…

Exports the current pipeline document as a human-readable XML source file.

This file is also used internally as the basis for the Ant task and Java Source export options, which are generated by appropriately configured XSLT transformations. With this export, you can create your own formats of export (e.g. customized Java code export or extended documentation generation).

3.12. Generate documentation…

Generates a self-contained HTML page with automatically generated documentation of the top-most pipeline document including things like commandline call syntax and Java call syntax, parameter descriptions and more.

4. Edit menu

4.1. Cut / Copy / Paste

The operations Cut, Copy and Paste are supported context sensitively, depending on where the current keyboard focus is directed to:

text field

When the focus is on a text field, these methods work as usual.

pipeline modules list

When the focus is on a module in the pipeline modules list, that module’s complete definition is copied in form of an XML snippet onto the clipboard. Using Paste while the focus is on a module entry, the module description on the clipboard is read and a new module is inserted above the currently selected module with all parameters set as for the module you copied. You can even copy modules conveniently across open pipelines this way.

4.2. Copy as UPL

When the focus is on a module in the pipeline modules list, this command will create UPL source code for running the selected module from UPL using the run-module() function and put it as text onto the clipboard. You can then insert it into a UPL code field within upCast or your favorite external editor where you are writing your UPL code.

4.3. Duplicate

This function will only work when the focus is on a module in the pipline modules list. In this case, it will insert a copy of the currently selected module directly below it.

5. Pipeline menu

5.1. Run

Runs the pipeline in the top-most pipeline window.

5.2. Simple View

With this toggle, you switch between the Simple View and Edit View of a pipeline configuration.

When checked, upCast shows its pipeline window in Simple View mode, hiding the actual pipeline implementation and showing only entry fields for the pipeline parameters that a typical user must supply.

When you want to edit the details of a pipeline, uncheck this item.

The state of this parameter is saved to the pipeline document and automatically restored at opening time. This means that for final distribution to your customers, check this parameter, then save the document again before packaging it into your distribution.

6. Help menu

6.1. Information…

Shows a window with detailed information on the execution environment of the topmost pipeline document and the upCast application, including version information on available XSLT processors, Java, loaded modules, license info etc. You may be asked by infinity-loop support for this info when tracking down problems you may have with upCast.

6.2. License Agreement

Shows the upCast License Agreement in a window.

6.3. Logfile

Shows a window with the external log file or a live view of log events as they are generated from within upCast.

The Source popup menu lets you choose between these two modes:

Logfile

shows the current contents of the log file on disk

Live Events

shows the log events in the system as they are generated from log sources within upCast

When showing Live Events, you can set a filter describing which log events generated by upCast should be displayed. This is done using the Filter text field. This setting is completely independent from the log level setting in upCast’s preferences. Several pre-defined settings are available from the associated popup menu, but you are free to specify any log event filtering expression you wish. The filter expression syntax is described here and is the same as used in other places within upCast.

All log events are held indefinitely while the window is open or until you click on Clear Window, so you should not leave the window open unattendedly as otherwise you will run out of heap space at some time. When the window is in Live Events mode, depending on the amount of logging events to display, you will see a performance degradation of pipeline execution. There’s no performance penalty when the window is closed, as then it detaches itself from all log sources automatically.

With Save as Text…, you can save the current contents of the window to a text file. You may be asked by infinity-loop support for this info when tracking down problems you may have with upCast.

6.4. upCast Documentation…

Shows this upCast reference documentation manual in the host system’s default web browser.

6.5. UPL Documentation

Shows the UPL reference documentation in the host system’s default web browser.

6.6. Javadoc API Documentation…

Shows the upCast API documentation (in javadoc format) in the host system’s default web browser.

6.7. Send Feedback…

Opens a pre-configured email in your default email application, ready to be amended by your problem report or question to infinity-loop Support department. This includes system information which you can preview in the generated email and – when desired for privacy reasons – trim to your liking.

You should use this function whenever you want to report a bug or problem to infinity-loop.

Chapter 4. The variable system

upCast offers several variable realms. Realms are distinct, non-overlapping value storage spaces. Think of them as different buckets placed next to each other, labelled with the realm name.

Some of the realms are read-only, and some of them calculate the actual value of a variable at the time of retrieval access.

Here’s an overview of the different realms and their names (monospace bold grey print) available in upCast:

Variable realms and names in upCast

To get or set a variable, two components must be specified:

  • the realm

  • the variable name

A variable reference is resolved in an upCast parameter field by simply replacing the variable reference by the textual value of the variable referenced.

Note

It is important to always keep in mind that the variable resolution process is an utterly dumb textual replacement process (much like a #define works in the C programming language). Specifically, no quoting or unquoting is performed.

The result of a variable reference to a variable that does not exist or cannot be resolved is the variable reference itself.

A piece of text containing variable references is processed as many times as the result changes. This allows you e.g. to have references to the include realm resolved also in already included content. Consequently, you must make sure that contents looking like a variable reference, but which may not be resolved, must be properly quoted (e.g. by doubling the $ sign). To avoid potential infinite recursion, this repeated resolution process on some source string is terminated when even after a certain number of iterations, changes in the result still occur. The limiting number of iterations currently is set at 32 by default. It can be changed by setting the Java property de.infinityloop.application.maxvarrecursion .

Naming restriction

All variable names that start with an upper-case letter are reserved for upCast’s own use.

You should therefore name your own variables in such a way that they do not start with an upper-case letter, even when at that time, a likewise named upCast-defined variables does not yet exist. We might introduce it in a subsequent release and make your pipeline not work correctly any more.

1. Variable reference syntax

The syntax to refer to a variable in a specific realm is similar to that of Ant, albeit with a twist:

${realm:name#modifier}

Note the special #modifier part: It is useful when wanting to modify the stored value of a variable before returning it in specific ways. This is most useful in file paths, e.g. to only retrieve the name of a file in an absolute path, the base name or just the path to some file.

Note

As with Ant, variable resolution is not recursive, i.e. you cannot write something like ${module:${pipeline:paramname}} to calculate the name of a module variable dynamically.

The components of a variable reference are:

realm

the realm of the variable; available values: application, pipeline, module, javaproperty, include

name

the name of the variable

modifier

for URL and file path variables only: extract elements of a path and/or convert the resulting variable value between local file system and URL format. The following modifiers are currently supported:

local

return the value of the variable in local file system format

url

return the value of the variable in URL format

localpath

return only the path component (without filename and without trailing file separator) of the value of the variable. If the variable is a folder, the value is returned unchanged.

urlpath

same as localpath, but returns the value in URL format

localname

returns only the file name component of the variable value in local format

urlname

same as localname, but returns the value in URL format

localextension

returns only the file extension (excluding the dot) of the variable value in local format; empty, when there is no file extension

urlextension

same as localextension, but returns the value in URL format

localbasename

returns the same value as localname, but with trailing dot and extension stripped if it exists

urlbasename

same as localbasename, but returns value in URL format

localbasenamepath

essentially, this is localpath + localbasename, i.e. the value of the variable minus extension (including trailing dot)

urlbasenamepath

same as localbasenamepath, but returns value in URL format

Example 4.1. Modifier sample results

With SourceFile having a value of "C:\Documents and Settings\upCast\The file.xml", the following variable references with modifiers will evaluate to:

${SourceFile#local}

C:\Documents and Settings\upCast\The file.xml

${SourceFile#url}

file:///C:/Documents%20and%20Settings/upCast/The%20file.xml

${SourceFile#localpath}

C:\Documents and Settings\upCast

${SourceFile#urlpath}

file:///C:/Documents%20and%20Settings/upCast

${SourceFile#localname}

The file.xml

${SourceFile#urlname}

The%20file.xml

${SourceFile#localextension}

xml

${SourceFile#urlextension}

xml

${SourceFile#localbasename}

The file

${SourceFile#urlbasename}

The%20file

${SourceFile#localbasenamepath}

C:\Documents and Settings\upCast\The file

${SourceFile#urlbasenamepath}

file:///C:/Documents%20and%20Settings/upCast/The%20file


Let’s have a look at the various realms in more detail:

2. The global realm

This realm is not yet available and will be implemented in a later release of upCast.

3. The application realm

This realm is read-only.

This realm includes upCast application-global values.

The following variables are currently defined:

variable name

description

SupportFolder

path to the (OS/system-specific) support files folder

BundledResources

path to the resources folder bundled with the application distribution (when it was installed using one of the system-specific distribution packages)

Logfile

the path to the external logfile as calculated by the application and/or set in the java system property de.infinityloop.application.logfile

By default, all file path values returned are in URL format. You can use all available modifiers on them, of course, to change format or extract parts from them.

Example 4.2. 

To retrieve the location of the application’s support folder on the system it is running on, use:

${application:SupportFolder}

4. The environment realm

This realm is read-only.

This realm includes upCast pipeline-global environment values. Most of them are virtual in the sense of that they reflect some current state of the execution environment at the time of recalling them and are not actually stored.

The following variables are currently defined:

variable name

Java type

description

version

Integer

the application version (0x0Mmr format)

version-string

String

the application version in "M.m.r" format

build

Integer

the build number

build-timestamp

String

the timestamp string of the build in the format "dd-mm-yy hh:mm:ss [+|-]zzzz"

license-features

List

list of Strings of features included in the current license; features not active are enclosed by parantheses ‘(‘ and ‘)’

license-features-valid

List

list of Strings that only includes features in the current license that are valid at the time of the query

license-info

String

a string describing the current license

license-feature-expdays-featurename

Integer

number of days until the license feature featurename expires

dir-installation

String

application installation folder

xml-catalogs

List

list of Strings identifying the locations of currently active XML catalog files in the pipeline

xml-xerces-version

String

version information of the included Xerces parser

xslt-xalan-version

String

version information of the included Xalan XSLT processor

xslt-saxon-version

String

version information of the included Saxon 9.x XSLT 2 processor

xslt-saxon6-version

String

version information of the included Saxon 6.x XSLT 1 processor

wordlink-version

Integer

version of the active WordLink component; returns null when WordLink is not installed or active

wordlink-wordversion

Integer

version of Microsoft Word that WordLink is currently linking to; returns null when Word is not installed on this machine or WordLink is not functional

wordlink-binarypath

String

absolute path to the application used for implementing the WordLink functionality; returns null when WordLink is not installed or active

mathlink-version

Integer

version of the active MathLink component (implementing the link to MathType 5); returns null when MathLink is not installed or active

mathlink-dllversion

Integer

version of the MathType DLL used for implementing MathLink; returns null when MathLink is not installed or active

progress-sublabel

String

the text currently displayed in the progress bar’s sub-label

progress-label

String

the text currently displayed in the progress bar’s label

progress-task-current

Long

the ordinal number (1-based) of the currently executed module task in the pipeline

progress-task-count

Long

the total number of tasks defined in the current pipeline

progress-task-current-max

Long

the maximum value for completion indication of the current task

progress-task-current-value

Long

the current value of completion for the currently running task; the task is completed when this value is equal to progress-task-current-max

dir-support

String

the folder searched for application support files

dir-licenses

String

the folder searched for license files

logfile

String

the absolute path of the log file written to

pipeline-gui

Boolean

true when this pipeline's GUI is shown

This means that the pipeline must be a top-level pipeline (see pipeline-toplevel) and that upCast must be running in GUI mode (in contrast to being run as a commandline tool or being controlled via its Java API)

pipeline-toplevel

Boolean

true when this pipeline is the top-level pipeline executing

This means that the pipeline is not one that is executed within an External Pipeline Processor as a sub-pipeline

pipeline-info

String

returns the contents of the pipeline info window as string

This information may prove useful for debugging, as it contains the complete running environment information of this pipeline in human readable form

pipeline-version

String

returns the compatibility version of the current pipeline as string or the empty string when the info is not available

This information is a copy of the respective parameter setting in the Pipeline Info > Settings tab

pipeline-build

String

returns the build of the current pipeline as string or the empty string when the info is not available

This information is a copy of the respective parameter setting in the Pipeline Info > Settings tab

version-latest

Integer

the build number of the latest available version of this application

This information is retrieved from infinity-loop’s servers by fetching the URL http://versioncheck.upcast.de/upcast7.plist.

When there is no newer version available, this returns 0.

When the information could not be retrieved (e.g. due to a server error or if there is no active connection to the internet), null is returned.

By default, all file path values returned are in URL format. You can use all available modifiers on them, of course, to change format or extract parts from them.

Example 4.3. 

To get information on the version of Xalan currently in use by upCast RT, write:

${environment:xslt-xalan-version}

which might return the value "Xalan Java 2.7.1".


For accessing these environment values from UPL, access them using the environment namespace like ordinary UPL variables. Java types as listed in the table above are coerced to the respective UPL types.

With a namespace definition of

#namespace environment "http://www.infinity-loop.de/namespace/upcast-realm/environment";

the code

println( $environment:dir-licenses );

might print the following on the console:

/Users/demo/Library/Application Support/infinity-loop/upCast RT/Licenses

and the code

println( $environment:license-features );

might print the following to the console:

{"rtfimportGUI","rtfimportAPI","rtfexportGUI","rtfexportAPI","uplGUI","uplAPI"}

5. The pipeline realm

It is often useful to store values that several modules will need as pipeline variables. Examples are the source document to process, the destination folder, the folder where images will be stored or the folder where temporary files should be created if needed by the pipeline.

The pipeline realm contains variables that are available to all modules in a specific pipeline. Each pipeline has its own set of pipeline variables. Modules can only access pipeline variables of the pipeline they are a member of.

Important

The set of pipeline variables is cleared before each execution of a pipeline with the exception of the following special, pre-defined, read-only variables:

6. The module realm

This realm includes all parameters of a single module in a pipeline. This realm can only be accessed from within that module, and only the parameters of the currently executed module at the time of reference resolution can be accessed.

Important

Referencing module variables is generally not recommended since upCast has no defined order of variable resolution and will not determine a suitable one by itself. Referring to module variables can therefore lead to infinite loops or referring to unresolved references.

7. The javaproperty realm

This realm is read-only, with the exception of the UPL execution context, where you can also set variables in that realm.

The javaproperty realm contains all currently defined Java system properties, either pre-defined ones by the Java Virtual Machine (like user.dir or user.home) or properties explicitly set on launch of the VM running the application.

Example 4.4. 

${javaproperty:user.home}

retrieves the path to the current user’s home directory.

${javaproperty:user.dir#url}

retrieves the path to the current directory in URL format by use of the #url modifier.


8. The include realm

This realm is read-only.

The include realm returns the contents of the file specified as the name of the variable. The syntax is as follows:

${include:/absolute/filepath/to/file.ext}
${include:relative/path/to/file.ext}

A relatively specified path is always considered to be relative to the value of ${pipeline:base}, i.e. the base URL of the pipeline.

The include realm can include parameters like e.g. specifying the encoding of the file to be used for reading it. The variable reference syntax can therefore take the following, extended form:

${include( paramname: "value" [, paramname: "value" ]* ):filepath}

The following table lists the possible parameters that can be specified for an include reference:

parameter name

value

encoding

the (Java-) name of the encoding to be used for reading the file

When this parameter is not specified, the platform’s default encoding is used.

source

lets you choose wherefrom the data to be included should originate from

file

the filepath component specifies a physical file (this is the default when the parameter is not specified)

variable

the filepath component specifies a variable identifier from which to get the included data. This option is useful when you need to pass literal code or code fragments (still to be parsed by upCast later) from a Simple View component into e.g. the External Pipeline Processor's Sub-pipeline Parameters field.

fallback

a string value which is the fallback replacement value when the include cannot be performed due to an error, e.g. if the refernced file does not exist, is not readable or has an encoding error

Normally, when a variable cannot be resolved in the INCLUDE realm, the variable reference is left verbatim. With a fallback value, it is possible to conditionally include a file if it is present into some other piece of code (by setting fallback to the empty string), or even insert some default code when a file is not present.

Example 4.5. 

1. The value of the variable reference

${include( encoding: "UTF-8" ):Resources/entity.map}

is the text contents of the file pipeline-basedir/Resources/entity.map, read with UTF-8 encoding.

2. The value of the variable reference

${include( source: "variable" ):pipeline:DestinationFolder}

is the text contents of the variable DestinationFolder in the pipeline realm.

3. Assuming the file pipeline-basedir/doesnotexist.txt does not exist or is not readable, the value of the variable reference

${include( fallback: "" ):doesnotexist.txt}

is the empty string "", the value of

${include:doesnotexist.txt}

is the string "${include:doesnotexist.txt}", and the value of

${include(fallback: "println('File does not exist!'"):doesnotexist.txt}

is the string "println('File does not exist.')".


9. Parameter and Variable Types

Parameters and variables are internally stored using standard, appropriate Java or UPL object types. Some parameters can take several different types, which, however, can only be set using native Java code using the upCast API or UPL functions. Parameters that can accept more than one of the following basic types will have this mentioned explicitly and in detail in their respective description section.

The basic parameter types are:

type name in this document

corresponding Java type

(class or interface)

corresponding UPL type

Bool

java.lang.Boolean

Bool

Integer

java.lang.Integer

Numeric

Double

java.lang.Double

Numeric

String

java.lang.String

String

List

java.lang.List

List

Object

java.lang.Object

Chapter 5. Application-level Settings

Some settings are global to the upCast application and (some of them optionally) affect all pipeline documents loaded.

1. Application Preferences

These can be set in the upCast Preferences dialog, available under the application menu (Mac OS X) or the File menu (other platforms).

To make the settings active, click Apply or the window’s close button.

The parameters are grouped into tabs.

1.1. Application Settings

Create new document on launch when no others are open

When selected, a new default pipeline document will be created on upCast launch when no other windows (e.g. from e previous, saved session) are open.

Re-open documents that were open when last quitting

When checked, all documents that were open when upCast was last quit will be re-opened in their previous locations.

Remember the most recent ___ pipeline documents

Here, you can enter the number of recently opened documents that should be listed in the File > Open Recent menu. Decreasing the value will forget any document listings beyond that new number.

Tip

To clear the File > Open Recent menu, set the number to 0, close the application preferences window by clicking Apply, then re-open and enter the number of documents you want to be remembered. Setting the value to 0 temporarily will clear the entire internal list of documents, effectively clearing the menu.

Log filter

Here, you can specify a log event filter expression. Only log events passing the filter expression are actually written to the external log file. Several often-used filter expressions are available from the associated popup menu, however you are free to define your own customized filter expressions. The filter expression syntax is described here.

Check for updates on launch

When checked, upCast will contact the infinity-loop version server to check whether a newer version of upCast is available for download. If there is, you will be notified in an info dialog.

Check for updates now…

Clicking this button will let you manually check for updates. This is particularly useful when Check for updates on launch is not checked.

When switching between Simple View and Edit View …

This parameter lets you set the behaviour with regard to window positioning and sizing when switching between Simple View and Edit View of pipeline windows. upCast can remember location and size of the window in each of the two modes and restore those settings when switching between them. The following behaviours are available:

keep current window size and position

this option tries to keep the current window size and position when switching between Simple View and Edit View if possible (This is the default and mirrors upCast's pre-7.5 behaviour.)

remember and restore window size

this option restores the last size of the window in each of the two modes when switching between them, keeping the current window position (upper left corner) fixed

remember and restore window size and position

this option restores the last size and position of the window in each of the two modes when switching between them

Window sizes and positions for each mode are saved to the pipeline file (*.ucdoc and *.ucpar) and therefore are available again when re-opening it.

Pipeline Template Paths

In this text field, you can specify paths where upCast will look for pipeline templates, one path per line. Use this if you store personal or company templates at a central place on your network and make those templates available automatically within upCast.

The default path form templates, which points to the templates copied to volume during installation, is

${application:BundledResources}/templates

You must include this path in this field if you want to have access to the application-included templates. On the other hand, if you want some users to not have access to the default templates but want them to be restricted to your specific, customized templates only, make sure that in those users’ installations, the default path is not included in the path list.

To add a path to the list, click Add Path… and navigate to the folder containing the pipeline template definition folders.

Empty lines or lines starting with // are considered comments and are discarded during parsing.

Note

You can use variable references from the include, application and javaproperty realm, but you cannot use the pipeline realm since the setting is application-global.

1.2. Catalogs

upCast supports the use of catalog files. A catalog file is in its simplest idea a mapping definition between PUBLIC DTD identifiers and the location of a physical copy of that specific DTD (or more general, entity). The upCast application supports the catalog file format as defined in http://www.oasis-open.org/specs/tr9401.html as well as XML Catalogs.

To add a catalog file, choose Add Catalog… and select the catalog file to add from the file system. The new catalog will be available to all modules immediately after closing the preferences window.

To remove a catalog, just delete its entry line.

Catalogs are considered in the order displayed.

OASIS catalog files are read with platform encoding, XML catalog files with the encoding specified in their XML declaration.

By clicking Insert upCast defaults, code is added to pick up any upCast default catalog possibly delivered with the application. You should have that entry in place for best performance.

Note

You can override the global Catalog setting individually for each pipeline.

1.3. Font Configuration

Font configuration

Specify the source code for the stdfonts.config override that should be used for this pipeline.

1.4. Encodings

Custom Encodings

To add a custom encoding file, choose Add Encoding… and select the custom encoding file (*.encoding) to add from the file system. Each line in the field specifies a custom encoding location.

To add a folder where upCast should pick all contained custom encoding files, choose Add Encodings Folder… and select it.

To remove a custom encoding entry, delete the text line containing its location specification.

1.5. License

This panel is for importing an upCast license file and reviewing current licensing status.

Certain module types require specific license features. The features available in the currently active license are listed in the license features table at the bottom of this panel. Please refer to the individual module’s documentation to check which license feature it requires to be fully (or: at all) functional.

Import new license

Clicking this button brings up a file chooser where you can find and select the license file you got sent upon your license request or purchase from infinity-loop’s licensing department. You get the chance to store this license in upCast’s Licenses special folder, so it will be available to you automatically at launch.

Pick from available licenses

Clicking this button shows you all licenses from upCast’s Licenses folder and any licenses packaged into the application itself that can be used for this version of upCast. This allows e.g. to switch between evaluation and full licenses or licenses with different features.

Chapter 6. Pipeline-level Settings

1. Typographical conventions

Parameters will be described using the following typography:

Name

DeleteEmpties

Java symbol

kDeleteEmptiesParamName

Type

Boolean

Value

false, true

Name gives the internal java.lang.String name of the parameter, which is used for storing in preferences files, and can be used in the Java API as String. However, in Java, the use of the Java symbol is highly recommended instead.

Java symbol names the Java constant definition for the parameter’s Name. All constants are defined in the class de.infinityloop.common.Params.

Type specified the recommended Java type to use when programming against the API. In the GUI version, that is taken care of automatically. Also, when using alternative interfaces like an Ant task, which allows only passing arguments as character strings, upCast tries to perform an appropriate cast. So you should make sure that the data you provide in these cases will be cast-able to the specified type, as otherwise the conversion will fail or produce incorrect results at runtime.

Value describes possible value ranges, supported keywords or other specifics about that parameter’s range.

2. Special pre-defined pipeline variables

2.1. PipelineBase, base

The pre-defined pipeline variables PipelineBase and base (deprecated) are automatically made available in the GUI version of upCast (read-only) and contain the path to the current pipeline document (*.ucdoc), excluding the actual name, in URL format (including trailing slash ‘/’). It is essential to have the pipeline document saved to a file on disk so upCast can determine this property. If the path can not be determined, the current directory is returned instead (Java property user.dir).

When using the upCast RT Java API (i.e., the UpcastEngine class) directly, this value must be explicitly set before working with pipelines that contain any references to values dependent on ${pipeline:PipelineBase}. Only use the setPipelineBaseURI() API method (class UpcastEngine) for setting the value for this pipeline variable.

You can use this for making the configuration independent from its actual location in the file system by specifying paths relative to the base variable, and storing an resources needed for the pipeline in subdirectories to this base URI.

For distributing a configuration, we recommend to put it at the root of a folder with required resources in sub-folders according to the following layout:

example folder layout for a distribution

Name

PipelineBase

Java symbol

kPipelineBaseParamName

Type

String

Value

absolute path to folder in which the current configuration file is located (if loaded via the GUI; automatically set) or the folder of a configuration distribution root in URL format

2.2. PipelineURI

This variable holds the full URI (as a file:-URL) of the pipeline document (*.ucdoc) implementing the current pipeline.

Name

PipelineURI

Java symbol

kPipelineURIParamName

Type

String

Value

absolute path to folder in which the current configuration file is located (if loaded via the GUI; automatically set) or the folder of a configuration distribution root in URL format

2.3. ParamBase

This variable holds the path to the current parameter set document (*.ucpar), excluding the actual name, in URL format (including trailing slash ‘/’). For regular pipeline documents, the contents of this variable is the same as that of PipelineBase.

Name

ParamBase

Java symbol

kParamBaseParamName

Type

String

Value

absolute path to folder in which the current parameter set file is located (if loaded via the GUI; automatically set) in URL format

2.4. ParamURI

This variable holds the full URI (as a file:-URL) of the current parameter set document (*.ucpar). For regular pipeline documents, the contents of this variable is the same as that of PipelineURI.

Determining if the current document is a pipeline or parameter set

This definition of the contents of the ParamURI pipeline variable can be used in UPL to determine whether the currently running application is run directly from a pipeline document (ucdoc) or via a parameter set (ucpar) with code like the following:

#namespace pipeline "";
  ...
if( ends-with( $pipeline:ParamURI, "ucpar" ) ) {
  /* we’re running from a parameter set document */
} else {
  /* we’re running from a regular pipeline document */
}

Name

ParamURI

Java symbol

kParamURIParamName

Type

String

Value

absolute path to folder in which the current configuration file is located (if loaded via the GUI; automatically set) or the folder of a configuration distribution root in URL format

2.5. PipelineInstanceId

This variable holds a UUID string identifying this particular running instance of a pipeline.

Identifying a certain pipeline object instance is necessary in some upCast XSLT extension functions which need to retrieve information from the pipeline object that is running the transformation. The value in this variable is used for these identification purposes and must be passed as a stylesheet parameter when needed there.

Name

PipelineInstanceId

Java symbol

kPipelineInstanceIdParamName

Type

String

Value

3. Pipeline Settings

Click the Pipeline Settings… button in the pipeline window to access the window for setting pipeline-level settings.

To make the settings active, click Close or the window’s close button.

Many of these settings allow you to override the settings made in the upCast preferences.

Tip

When you are using the upCast GUI as the prototype and testing environment for your pipeline development, but intend to later export it in form of Java source code or an Ant task, we recommend overriding the global settings by pipeline-specific settings to get consistency in your output for a specific pipeline document instead of being dependent on the current global application preferences at time of export.

Access the various settings by choosing the respective tab:

3.1. Pipeline Parameters

Here, you can set up a description of parameters you want your pipeline to be dependent on. The information provided here is used in three ways:

  1. to create a simplified view and data entry UI objects for the user of a pipeline, where you want to hide the details of the implementation (i.e. the kind and order of modules used, calculations etc.),

  2. to define the parameters a pipeline accepts and requires to be able to run from the commandline or via the Java API functions, including the ability to check those parameters for legal values, and

  3. to provide documentation for the semantics of a parameter, which is shown in form of help tags in the UI, as text in the commandline, and formatted as HTML document when generating the pipeline documentation

This is a convenient feature to distribute complete, parameterized pipeline solutions to your customers in an easy-to-use, packaged way. All they need to do is open the pipeline, supply the requested parameters, and click the Run button. They are therefore completely shielded from the (possibly many) modules building up the pipeline and their complexity.

Interface element and parameter definitions

The description code you provide here serves two purposes:

  1. It is the basis for determining the number and name of pipeline parameters.

  2. It specifies the kind of form display element for each of these parameters.

Basically, you specify the name of the pipeline variable you wish to have set to the specified pipeline parameter’s value. This value is supplied as initial, pre-set pipeline variable to your pipeline definition.

Important

The pipeline parameters are only set when the GUI is in Simple View mode. When in full editing mode, the pipeline is executed with a completely clean set of pipeline variables (except for the base variable) – unless you check the Set specified parameter defaults when running a pipeline in edit view option (see above). In the latter case, the default values for those parameters that have a default specified are set.

Before the defining code is interpreted, upCast resolves any contained variable references for the following realms and in that order:

  1. include

  2. javaproperty

  3. application

You cannot (for obvious resons) access variables in the pipeline or module realm.

Tip

You can use the include variable reference to your advantage in projects where you have to create similar pipelines that essentially should have the same Simple View definitions. To keep those in-sync, you can use an external file holding the parameter definition code, then include it in all pipelines that should show the same UI and have the same parameters. You then only need to update that single external file, and the UI definitions are updated automatically in all pipelines that include it.

upCast offers several types of UI elements for parameter entry: a decorating label, a text field or box, a filechooser, a popup menu and a checkbox, each one with its own set of dedicated properties.

You must assign one of these entry types to each pipeline parameter you need. The syntax for describing the properties is based on a CSS rule set: The selector part takes the form of an element selector and supplies the name of the pipeline variable to set. The declaration block part specifies the specific display and behavioural properties for that UI element.

Here are the properties which you can set for each of the following available types (option values are case-sensitive!):

3.1.1. Text label

This UI element creates static text. You can use this for headings, parameter grouping or parameter descriptions for the pipeline GUI users.

Note

We recommend creating text labels with an ID type of selector, since using an element selector would create a pipeline variable by that name (and reserve a likewise named pipeline variable).

Using an ID type of selector part prevents this from happening, and the label will just server for showing text in the UI without any further effects.

label

type

label

text

the text to display in the label

font-family

the name of the font to use; when not specified, the system’s default label font

font-weight

normal | bold

font-style

normal | italic

font-size

size of the font; when not specified, the system’s default label font size

color

the text color; must be a CSS 2.1 color value

background-color

the background color; must be a CSS 2.1 color value

initialize-when

never | unset | always | <string>

Defines parameter values setting and initialization behaviour when the pipeline is run as a sub-pipeline from within an External Pipeline Processor (see there for details); the default value is unset

hidden

when true, this control is not displayed. The default value is false.

locked

when true, this control's value cannot be edited as long as the pipeline has an edit-lock password defined and the edit lock is in effect. The default value is false.

Example 6.1. 

#myLabel {
    type: label;
    text: "Simple View Sample";
    font-size: 20pt;
    font-weight: bold;
    color: olive;
}

creates a label with 20pt font size, bold text and olive text color.


3.1.2. Text field

This UI element creates a text field for arbitrary text.

It will create a parameter and pipeline variable of type String.

text

type

text

label

text to display as label at the very left

persistent

when true, current values are saved to the pipeline document and re-loaded when the document is next opened

when false, the value is not stored in the document and is assigned the default value (if specified) when the document is next opened

The default value is false.

default

the default value for the parameter, if newly created or persistent is false

description

the string specified here is displayed as tool-tip text in the UI when hovering over the input element and also used for descriptive messages when running in commandline mode

required

(only used in upCast commandline mode) when true, this means that specifying a value for this parameter is required to run the pipeline

when false, that parameter is not required to be specified but instead, its default value should be used (if specified) or not be set at all if there is no default value

The default value is true.

lines

the number of lines of the text field, the default is 1

postfix

the text to display after the text entry field; use this e.g. for displaying a value unit like "dpi" to let the user know the semantics of the number entered in the text field

initialize-when

never | unset | always | <string>

Defines parameter values setting and initialization behaviour when the pipeline is run as a sub-pipeline from within an External Pipeline Processor (see there for details); the default value is unset

hidden

when true, this control is not displayed. The default value is false.

locked

when true, this control's value cannot be edited as long as the pipeline has an edit-lock password defined and the edit lock is in effect. The default value is false.

Example 6.2. 

headerText {
  type: text;
    label: "Header Text:";
    persistent: true;
    default: "My Publication";
    lines: 2;
}

creates a field to input header text used in the pipeline. The pipeline variable created will be named headerText, and values the user inputs will be stored across document openings. The input field will show two lines of text, and will be pre-occupated with the text "My Publication" on initial creation.


3.1.3. File chooser, folder chooser

This UI element creates a text field for entering a file or folder path in local or URL format. It also displays a button to pick a file or folder using the system's file chooser UI.

It will create a parameter and pipeline variable of type String.

filechooser

type

filechooser

label

text to display as label at the very left

persistent

when true, current values are saved to the pipeline document and re-loaded when the document is next opened

when false, the value is not stored in the document and is assigned the default value (if specified) when the document is next opened

The default value is false.

default

the default value for the parameter, if newly created or persistent is false

description

the string specified here is displayed as tool-tip text in the UI when hovering over the input element and also used for descriptive messages when running in commandline mode

required

(only used in upCast commandline mode) when true, this means that specifying a value for this parameter is required to run the pipeline

when false, that parameter is not required to be specified but instead, its default value should be used (if specified) or not be set at all if there is no default value

The default value is true.

mode

open

displays a chooser for opening file

save

displays a chooser for saving a file

folder

displays a chooser for picking a folder

file-or-folder

displays a chooser for picking a file or a folder

format

local

converts the chosen filepath in local file naming convention

url

converts the chosen filepath to a URL

initialize-when

never | unset | always | <string>

Defines parameter values setting and initialization behaviour when the pipeline is run as a sub-pipeline from within an External Pipeline Processor (see there for details); the default value is unset

hidden

when true, this control is not displayed. The default value is false.

locked

when true, this control's value cannot be edited as long as the pipeline has an edit-lock password defined and the edit lock is in effect. The default value is false.

Example 6.3. 

SourceFile {
    type: filechooser;
    label: "Source file:";
    persistent: true;
    mode: open;
    format: url;
}

creates a field with a button to call a file chooser. The pipeline variable created will be named SourceFile, and values the user inputs will be stored across document openings. The file chooser will allow the user to pick files only, and the result will be stored in URL format in the editable input field.


3.1.4. List of files or folders

This UI element creates a text field for entering a list of file or folder paths (one per line) in local or URL format. It also displays a button to add a file or folder using the system's file chooser UI at the end of the current list.

It will create a parameter and pipeline variable of type List consisting of one line each of the input field as String value (in the displayed order).

filelist

type

filelist

label

text to display as label at the very left

persistent

when true, current values are saved to the pipeline document and re-loaded when the document is next opened

when false, the value is not stored in the document and is assigned the default value (if specified) when the document is next opened

The default value is false.

default

the default value for the parameter, if newly created or persistent is false

description

the string specified here is displayed as tool-tip text in the UI when hovering over the input element and also used for descriptive messages when running in commandline mode

required

(only used in upCast commandline mode) when true, this means that specifying a value for this parameter is required to run the pipeline

when false, that parameter is not required to be specified but instead, its default value should be used (if specified) or not be set at all if there is no default value

The default value is true.

mode

open

displays a chooser for opening file

save

displays a chooser for saving a file

folder

displays a chooser for picking a folder

file-or-folder

displays a chooser for picking a file or a folder

format

local

converts the chosen filepath in local file naming convention

url

converts the chosen filepath to a URL

lines

the number of entries (=lines) of the text field in the display, the default is 4

initialize-when

never | unset | always | <string>

Defines parameter values setting and initialization behaviour when the pipeline is run as a sub-pipeline from within an External Pipeline Processor (see there for details); the default value is unset

hidden

when true, this control is not displayed. The default value is false.

locked

when true, this control's value cannot be edited as long as the pipeline has an edit-lock password defined and the edit lock is in effect. The default value is false.

Example 6.4. 

inputFiles {
    type: filelist;
    label: "Source files:";
    persistent: true;
    mode: open;
    format: url;
    lines: 5;
}

creates an entry field for a list of file specifications where each single line corresponds to one list item, i.e. here: a file path. An empty line create a list item consisting of the empty string. The pipeline variable created will be named inputFiles, and values the user inputs will be stored across document openings. The file chooser will allow the user to add files only, and the result will be stored in URL format in the editable input field.


3.1.5. Popup menu

This UI element creates a popup menu to pick a single value among a set of pre-defined ones.

It will create a parameter and pipeline variable of type String holding the internal value (see internal-values property for details) representation of the currently selected item.

popup

type

popup

label

text to display as label at the very left

persistent

when true, current values are saved to the pipeline document and re-loaded when the document is next opened

when false, the value is not stored in the document and is assigned the default value (if specified) when the document is next opened

The default value is false.

default

the default internal value for the parameter, if newly created or persistent is false. The value here must be exactly one of the values specified in the internal-values property.

description

the string specified here is displayed as tool-tip text in the UI when hovering over the input element and also used for descriptive messages when running in commandline mode

required

(only used in upCast commandline mode) when true, this means that specifying a value for this parameter is required to run the pipeline

when false, that parameter is not required to be specified but instead, its default value should be used (if specified) or not be set at all if there is no default value

The default value is true.

values

a space- or comma-separated list of values to display in the popup and to pass into the pipeline variables

internal-values

a space- or comma-separated list of internal values. The value set on the pipeline variable is the one from this list whose index matches the selected option from the values property’s list of displayed values. Use this to use descriptive values in the displayed popup, while still getting short enum-type values in your variable. It also allows for easy localization of displayed values without having to change internal processing.

initialize-when

never | unset | always | <string>

Defines parameter values setting and initialization behaviour when the pipeline is run as a sub-pipeline from within an External Pipeline Processor (see there for details); the default value is unset

hidden

when true, this control is not displayed. The default value is false.

locked

when true, this control's value cannot be edited as long as the pipeline has an edit-lock password defined and the edit lock is in effect. The default value is false.

Example 6.5. 

targetType {
    type: popup;
    label: "Target type:";
    persistent: true;
    default: "db4";
    values: "DocBook 4", "DocBook 5", "DITA";
    internal-values: "db4", "db5", "dita";
}

creates a popup with three entries, "DocBook 4", "DocBook 5" and "DITA". The pipeline variable created will be named targetType, and its value will be one of the values "db4", "db5" or "dita", and the value selection will be stored across document openings. The default value of the variable will be "db4" upon field creation.


3.1.6. Checkbox

This UI element creates a labelled check box to pick enable or disable (turn on or turn off resp. set to true or to false) a specific boolean-valued option.

It will create a parameter and pipeline variable of type Bool holding the boolean representation of the current state of the checkbox (true when checked, false otherwise).

checkbox

type

checkbox

label

text to display as label at the very left

persistent

when true, current values are saved to the pipeline document and re-loaded when the document is next opened

when false, the value is not stored in the document and is assigned the default value (if specified) when the document is next opened

The default value is false.

default

the default value for the parameter, if newly created or persistent is false, either true or false

description

the string specified here is displayed as tool-tip text in the UI when hovering over the input element and also used for descriptive messages when running in commandline mode

required

(only used in upCast commandline mode) when true, this means that specifying a value for this parameter is required to run the pipeline

when false, that parameter is not required to be specified but instead, its default value should be used (if specified) or not be set at all if there is no default value

The default value is true.

text

the checkbox’s label text next to the actual checkbox graphic

initialize-when

never | unset | always | <string>

Defines parameter values setting and initialization behaviour when the pipeline is run as a sub-pipeline from within an External Pipeline Processor (see there for details); the default value is unset

hidden

when true, this control is not displayed. The default value is false.

locked

when true, this control's value cannot be edited as long as the pipeline has an edit-lock password defined and the edit lock is in effect. The default value is false.

Example 6.6. 

includeStyle {
  type: checkbox;
    label: "Option:";
    persistent: false;
    default: true;
    text: "Include style information";
}

creates a checkbox with text "Include style information". The pipeline variable created will be named includeStyle and will have the Boolean value true when the box is checked, false otherwise. The popup value selection will not be remembered across document openings. The default will be the option being checked (=on).


3.1.7. List of strings

This UI element creates a text field for entering a list of arbitrary, single-line strings (one per line).

It will create a parameter and pipeline variable of type List consisting of one line each of the input field as String value (in the displayed order).

list

type

list

label

text to display as label at the very left

persistent

when true, current values are saved to the pipeline document and re-loaded when the document is next opened

when false, the value is not stored in the document and is assigned the default value (if specified) when the document is next opened

The default value is false.

default

the default value for the parameter, if newly created or persistent is false

description

the string specified here is displayed as tool-tip text in the UI when hovering over the input element and also used for descriptive messages when running in commandline mode

required

(only used in upCast commandline mode) when true, this means that specifying a value for this parameter is required to run the pipeline

when false, that parameter is not required to be specified but instead, its default value should be used (if specified) or not be set at all if there is no default value

The default value is true.

lines

the number of lines of the text field, the default is 4

initialize-when

never | unset | always | <string>

Defines parameter values setting and initialization behaviour when the pipeline is run as a sub-pipeline from within an External Pipeline Processor (see there for details); the default value is unset

hidden

when true, this control is not displayed. The default value is false.

locked

when true, this control's value cannot be edited as long as the pipeline has an edit-lock password defined and the edit lock is in effect. The default value is false.

Example 6.7. 

stylenames {
    type: list;
    label: "List of stylenames:";
    description: "list of Word style names to handle";
    persistent: true;
}

creates an entry field for a list of values where each single line corresponds to one list item. An empty line create a list item consisting of the empty string.


The order in which the parameters are defined determine the display order and the order of parameters in created Java code functions.

Example 6.8. 

Concatenating all individual parameter definition examples from above into one, the following Simple View for the pipeline would be created:


Name

ParameterDefinitions

Java symbol

kParameterDefinitionsParamName

Type

String

Value

Reset persistent values

This clears any currently stored persistent values in the pipeline document.

Tip

You should clear the values when you make significant changes to the paraneter definition and prior to saving the pipeline configuration for distribution to your customers, so they do not see your last, private settings you made during development for parameters having persistence turned on.

3.2. Settings

Initialization

This parameter lets you programmatically set pipeline parameters as well as (dynamically) prevent running the pipeline at all. For this, you can write a custom UPL function initialize() by clicking on the Edit initialization code… button.

Note

The text on the Edit initialization code… button will be bold when a custom initialization function has been defined (and therefore the code field is not empty). This lets you quickly see if a pipeline defines a custom initialization function without having to open the code entry dialog.

If the button text is plain, the code field is empty and the pipeline will always be executed.

Tip

If you always want to run the pipeline unconditionally (and don't need the function for parameter overrides), make sure the code field is empty. This also allows you to see at a glance in the UI to see whether a custom function is defined, and you protect you against possible future signature changes and therefore code incompatibilities in the initialize() function when you effectively don’t even use its features.

If the initialize() function returns EXECUTE (which is the default), the pipeline is further executed.

If the initialize() function returns SKIP, the pipeline’s modules are not executed.

If the initialize() function returns TERMINATE, the pipeline’s action is not performed and additionally, further execution is aborted.

In the initialize() function’s body, you can run arbitrary UPL code. This code is run just before actually performing the pipeline’s programmed functionality. This function hook’s main intent is to give you the possibility to programmatically and dynamically set pipeline parameter values based on e.g. pipeline variable values (which in turn may have been set through the Simple View or by an external parameter passed to the pipeline). This way, you can set a parameter that does not allow you to have variable references expanded, like popups or check boxes. Additionally, this function serves as a dynamically evaluated condition specifying whether to run the pipeline or not.

Name

InitializationCode

Java symbol

kInitializationCodeParamName

Type

String

Value

UPL source code

Finalization

This parameter lets you specify the condition under which the pipeline signals an error to its parent, which is the application when it is a top-level pipeline, or the executing component, when it is run as a sub-pipeline (e.g. by the External Pipeline Processor).

In the case of being a top-level pipeline, signalling an execution error will result in an error dialog to be shown (if run in the GUI) or an exception being thrown (when run via the Java API).

You can specify the cases in which a pipeline execution failure should be signalled by using several pre-defined, often used conditions, or you can specify a custom condition in UPL:

continue

pipeline execution is always reported as successful

signal on FATAL

signal a pipeline execution failure when during execution, a FATAL log message has been received

signal on ERROR

signal a pipeline execution failure when during execution, a FATAL or ERROR log message has been received. This is the default for new pipelines.

signal on WARN

signal a pipeline execution failure when during execution, a FATAL, ERROR or WARN log message has been received

Log message forwarding behaviour

In all of the four pre-programmed finalization modes above, collected log messages from level WARN and up are forwarded to the parent (usually a pipeline object). See also the section on logging for more details.

custom finalization:

this option lets you specify custom UPL function code which, by returning one of the two Id values TERMINATE or CONTINUE, signals the failure state of the pipeline

To edit the UPL code for the custom finalize() function, click the Edit finalization code… button. By returning the Id TERMINATE, you indicate that the execution of the pipeline has failed, and by returning CONTINUE you indicate that the pipeline execution succeeded.

The custom function receives an Id parameter which is TERMINATE when one of its child modules requested explicit, premature pipeline termination, CONTINUE otherwise.

Example 6.9. Finalization function template

function finalize( $childFinalizationResult as Id ) as Id {
  variable $result as Id := $childFinalizationResult; // default: CONTINUE
  /* Return the Id TERMINATE when you want to terminate the pipeline, CONTINUE otherwise. */
  return $result;
}

Generating a custom error message

Additionally, in the custom finalization code field, you can optionally specify a second UPL function, message-text(). When this function is defined and does not return the empty string, when finalize() returns TERMINATE, the string returned by this function will be shown to the user instead of the default message generated by upCast. This allows you to generate error messages that are tailored specifically to your application and its user base.

Example 6.10. Custom message text function template

function message-text() as String {
  variable $result as String := "";
  /* Return a non-empty message string to display an error dialog resp. write the error text to the log. */
  return $result;
}

Name

FinalizationMode

Java symbol

kFinalizationModeParamName

Type

String

Value

continue | signal-fatal | signal-error | signal-warning | custom

Name

FinalizationCode

Java symbol

kFinalizationCodeParamName

Type

String

Value

UPL source code

Log filter

Sets the logging threshold for messages that the module accepts from children and produces itself (see the logging architecture description for details).

Some default filter expressions are available from the associated popup menu, however you are free to define your own customized filter expressions. The filter expression syntax is described here.

If set to inherit, the logging filter settings are governed by the application's filter settings for the external logger as set in the application's preferences.

Name

LogFilterSpec

Java symbol

kLogFilterSpecParamName

Type

String

Value

inherit | OFF | FATAL | ERROR | WARN | INFO | DEBUG | VERBOSE | DETAIL | TRACE | ALL | logfilterspec

Edit-Lock password

Here, you can specify a password that prevents switching off the Simple View and hence editing the pipeline from withing the GUI. It does not encrypt the pipeline document itself!

To remove the lock, clear the password field. The password "__________" (10 underscores) must not be used.

The password is stored in the pipeline document as base64-encoded MD5 hash.

Name

EditLockPassword

Java symbol

kEditLockPasswordParamName

Type

String

Value

Pipeline UID

To implement file-location independent linking from Parameter Set documents to their implementation pipeline document, each pipeline document must have a unique ID. This need not be a standard UUID when you can guarantee that these IDs will only be used in a controlled environment, where we suggest using speaking IDs to make it easier for users to manually find the respective pipeline given a UID value, which may be necessary when a link gets broken to a mis-configuration of the ID resolver.

By default, every pipeline document that is opened that does not yet have a non-empty pipeline ID setting, upCast will automatically generate a UID and set it for that pipeline document.

Name

PipelineUUID

Java symbol

kPipelineUUIDParamName

Type

String

Value

Required upCast build number

Enter the build number of the upCast application that this pipeline requires as a minimum to be able to run. When a user tries to run the pipeline with an application version that has a build number less than the one specified here, a dialog is shown allowing the user to abort the execution of the pipeline (the default), execute it nevertheless (at his own risk), or aborting the execution and check automatically for a newer version of the application at the vendor site.

When you leave the field empty, no minimum requirement check is performed.

When no UI is available (e.g. when running from the commandline or via the Java API), execution is aborted and a FATAL log message with details is generated.

Name

RequiredBuildNumber

Java symbol

kRequiredBuildNumberParamName

Type

Integer

Value

Runnable (by itself)

This option tells upCast if the pipeline can run by itself (the default value, i.e. option checked) or not.

You can use this to identify pipelines that are only useful e.g. as sub-pipelines called from other pipelines, especially when they need some setup (like inherit a current document tree they perform some specialized operations on, but not building it themselves).

When this option is not checked, the pipeline cannot be run as a top-level pipeline from the GUI. This is accomplished by disabling the Pipeline > Run command and the Run button in the pipeline window.

3.3. Catalogs

Inherit from parent

When checked, the catalogs set in the nearest pipeline ancestor in the call chain or, if it is the top-level pipeline, of the upCast application preferences will be used. When unchecked, the Catalog files setting as specified will be used.

Name

UseGlobalCatalogs

Java symbol

kUseGlobalCatalogsParamName

Type

Bool

Value

Catalog files

To add a catalog file, choose Add Catalog… and select the catalog file to add from the file system. Each line in the field specifies a catalog location. The new catalog will be available to all modules immediately after closing the preferences window.

When the catalog resides in the file hierarchy below the pipeline base URI, it is made relative by default using the ${pipeline:base} variable notation.

When you hold down the Alt key while clicking Add Catalog…, upCast tries to generate a pipeline base URI-relative path, even when the location is outside the directory subtree under the pipeline base URI.

To remove a catalog, delete the text line containing its location specification.

Catalogs are considered in the order displayed.

Name

Catalogs

Java symbol

kCatalogsParamName

Type

String

Value

one path to a catalog per line as string

3.4. Font Configuration

Inherit from parent

When checked, the font configuration set in the nearest pipeline ancestor in the call chain or, if it is the top-level pipeline, of the upCast application preferences will be used. When unchecked, the Font configuration setting as specified will be used.

Name

UseGlobalFontConfig

Java symbol

kUseGlobalFontConfigParamName

Type

Bool

Value

Font configuration

Specify the source code for the stdfonts.config override that should be used for this pipeline.

Name

FontConfiguration

Java symbol

kFontConfigurationParamName

Type

String

Value

font configuration code

3.5. Encodings

Inherit from parent

When checked, the custom encoding setting set in the nearest pipeline ancestor in the call chain or, if it is the top-level pipeline, of the upCast application preferences will be used. When unchecked, the Custom Enconigs setting as specified will be used.

Name

UseGlobalEncodings

Java symbol

kUseGlobalEncodingsParamName

Type

Bool

Value

Custom Encodings

To add a custom encoding file, choose Add Encoding… and select the custom encoding file (*.encoding) to add from the file system. Each line in the field specifies a custom encoding location. When the custom encoding resides in the file hierarchy below the pipeline base URI, it is made relative by default using the ${pipeline:base} variable notation.

To add a folder where upCast should pick all contained custom encoding files, choose Add Encodings Folder… and select it. When the folder resides in the file hierarchy below the pipeline base URI, it is made relative by default using the ${pipeline:base} variable notation.

When you hold down the Alt key while clicking Add Encoding… or Add Encodings Folder…, upCast tries to generate a pipeline base URI-relative path, even when the location is outside the directory subtree under the pipeline base URI.

To remove a custom encoding entry, delete the text line containing its location specification.

Name

CustomEncodings

Java symbol

kCustomEncodingsParamName

Type

String

Value

paths to custom encodings (either to an individual custom encoding file or to a folder containing *.encoding files), with one entry per line

3.6. License

Inherit from parent

When checked, the license set in the nearest pipeline ancestor in the call chain or, if it is the top-level pipeline, of the upCast application preferences will be used. When unchecked, the License location setting as specified will be used.

Name

UseGlobalLicense

Java symbol

kUseGlobalLicenseParamName

Type

Bool

Value

License location

To set the license to be used for running this pipeline, click Choose license file… and select the license file (*.uclicense) to be used.

When the license file resides in the file hierarchy below the pipeline base URI, it is made relative by default using the ${pipeline:PipelineBase} variable.

The license details of the active license are displayed in the fields below for your reference.

Name

LicenseFile

Java symbol

kLicenseFileParamName

Type

String

Value

path to license file

3.7. Export

Java source export

Fully qualified class name

Specify the fully qualified class name where the Java source code export option should place the generated code. Any required, nesting package folders will be created automatically by the File > Export to Java Source… function.

Name

ExportJavaClass

Java symbol

kExportJavaClassParamName

Type

String

Value

Source root folder

Specify the absolute path to the source root folder, i.e. the root of the Java package hierarchy subdirectories. You can use the ${pipeline:base} variable as the first component of the path to specify the source root relative to the pipeline base URI.

When this field is left empty, upCast will ask for the source root folder every time you call the File > Export to Java Source… function. When this field is non-empty, that value will be used silently when calling the Java source export function.

With the Choose… button, you can request a file chooser to pick the Java source root folder. When this is a subdirectory of the pipeline base URI, the path is automatically made relative to it.

Tip

When you press the Alt key while clicking Choose…, upCast tries to always make the path relative, even if it is not in the subtree under the pipeline base URI.

Name

ExportJavaSourceRoot

Java symbol

kExportJavaSourceRootParamName

Type

String

Value

Ant build module export

‘upcast.jar’ location (or Ant expression)

Here you specify the path or expression to insert into the upcast Ant task definition to the upcast.jar file containing the actual Java code for the task. If you leave that field empty, "upcast.jar" will be used in the created Ant file module when using File > Export as Ant Task….

Note

The text you enter here is first processed by the usual upCast variable resolution mechanism. This has the advantage that you can use upCast variables for calculating the path, must, however, take care to quote the ‘$’ character (dollar sign) when you want that verbatim, e.g. to reference Ant variables.

So you could use something like

$${basedir}/tasks/upcast.jar

to keep the generated Ant file portable by referring to upcast.jar relatively from the Ant build file’s base directory.

Name

ExportAntJarLocation

Java symbol

kExportAntJarLocationParamName

Type

String

Value

literal Ant value for task’s ‘basedir’ attribute

Here you specify the literal code to be used for the upcast task’s basedir attribute in the generated target. This is useful to calculate the pipeline base URI relative to some Ant property and thus make the generated build module position independent. upCast variables are resolved as usual before writing the resulting text to the build file.

Example 6.11. 

To calculate the pipeline base URI to be used by the task relative to the position of the build file, you may want to use a setting like

$${basedir}/MyPipelineRoot/

Note how you must quote the ‘$’ character (dollar sign) to avoid upCast trying to treat it as an upCast variable and expand it.


Name

ExportAntBasedir

Java symbol

kExportAntBasedirParamName

Type

String

Value

literal Ant code for <source> selection

Here you specify the literal Ant source XML code to be inserted into the generated target code for selecting the source file(s) to be used.

With the Add source… button, you can generate code for a single source file.

When holding down the Meta (Mac OS X: Command) key while clicking the Add source… button, you can generate code for all files in the selected folder. A commented-out line for filtering based on extension is automatically generated, which you can uncomment and fill in as desired.

For both cases, when additionally holding down the Alt key, the reference generated will be relative to the literal value specified in the literal Ant value for task’s ‘basedir’ attribute field. For this, a special local variable ${taskbase} is used, which gets replaced by the resolved contents of the literal Ant value for task’s ‘basedir’ attribute parameter.

For the syntax used for source specification, see the description of the upCast Ant task.

upCast variables are resolved as usual before writing the resulting text to the build file, including the resolution of the special ${taskbase} variable as the last resolution step.

Name

ExportAntSourceCode

Java symbol

kExportAntSourceCodeParamName

Type

String

Value

3.8. Documentation

This is a free form text field for adding notes or documentation to this pipeline setup. You can use HTML tags which are copied verbatim into the generated documentation for the pipeline (via File > Generate Documentation…).

Name

ModuleDocumentation

Java symbol

kModuleDocumentationParamName

Type

String

Value

HTML code (will be copied into generated HTML documentation)

Chapter 7. Module-level settings

Each module type has its own, dedicated set of parameters to control its behavior. A few parameters are shared by all modules, both in name and semantics. These are listed explicitly below. However, all other parameter names are to be interpreted with the context of the module’s functionality in mind to infer their meaning.

Internally, parameters of modules are dynamically, weakly typed, though each parameter has a recommended or even required (by definition) type.

1. Typographical conventions

Parameters will be described using the following typography:

Name

DeleteEmpties

Java symbol

kDeleteEmptiesParamName

Type

Boolean

Value

false, true

Name gives the internal java.lang.String name of the parameter, which is used for storing in preferences files, and can be used in the Java API as String. However, in Java, the use of the Java symbol is highly recommended instead.

Java symbol names the Java constant definition for the parameter’s Name. All constants are defined in the class de.infinityloop.common.Params.

Type specified the recommended Java type to use when programming against the API. In the GUI version, that is taken care of automatically. Also, when using alternative interfaces like an Ant task, which allows only passing arguments as character strings, upCast tries to perform an appropriate cast. So you should make sure that the data you provide in these cases will be cast-able to the specified type, as otherwise the conversion will fail or produce incorrect results at runtime.

Value describes possible value ranges, supported keywords or other specifics about that parameter’s range.

2. Module Options (common for all modules)

The following parameters are available on all modules:

Active checkbox

When the "active" checkbox in the upper left corner of the module parameter pane is checked, the module is active in the pipeline.

During pipeline development, it is often useful to have several differently configured modules to switch between, or to have modules inserted in the pipeline that generate some sort of debug output. To quickly activate and deactivate a module without having to actually delete or insert it again into a pipeline, with this parameter, modules can be quickly temporarily disabled by unchecking it.

Deactivated modules are completely skipped during a pipeline run and impose only minimal overhead – actually, it’s just writing a line to the log file.

Name

ModuleEnabled

Java symbol

kModuleEnabledParamName

Type

Bool

Value

true | false

Name

Here, you can assign a meaningful name to a module instance. By default, modules’ names are their type, like "XSLT Processor" or "RTF Importer". However, when you have e.g. several XSLT processors in your pipeline, it is desirable to use more meaningful names, like "strip namespaces XSLT" or "TEI conversion transformation".

Name

InstanceNameUser

Java symbol

kModuleInstanceNameUserParamName

Type

String

Value

an arbitrary string

Export

When checked, this module is handled (exported) in a File > Export… function.

You can use this to set up a single upCast pipeline document in such a way that for export to Java code or an Ant task, only certain modules will be exported. This lets you use some module instances for debugging in the UI, which then won’t be part of an exported pipeline representation.

For the Export as XML… function, module elements will have an additional attribute export with value true or false, respectively. This allows you to decide in any custom post-processing of that pipeline export format whether you want to handle that module in a special way (like discarding it completely like the built-in export options Ant and Java source).

Name

ModuleExported

Java symbol

kModuleExportedParamName

Type

Bool

Value

true | false

Initialization

This parameter lets you programmatically set module parameters as well as (dynamically) prevent running the module even when its active checkbox is checked. For this, you can write a custom UPL function initialize() by clicking on the Edit initialization code… button.

Note

The text on the Edit initialization code… buttonwill be bold when a custom initialization function has been defined (and therefore the code field is not empty). This lets you quickly see if a module defines a custom initialization function without having to open the code entry dialog.

If the button text is plain, the code field is empty and the module will always be executed.

Tip

If you always want to run the module unconditionally, make sure the code field is empty. This also allows you to see at a glance in the UI to see whether a custom function is defined, and you protect you against possible future signature changes and therefore code incompatibilities in the initialize() function when you effectively don’t even use its features.

If the initialize() function returns EXECUTE (which is the default), the module is further executed.

If the initialize() function returns SKIP, the module’s action is not performed and the subsequent module in the pipeline (if there is one) is run.

If the initialize() function returns TERMINATE, the module’s action is not performed and additionally, further pipeline execution is aborted.

In the initialize() function’s body, you can run arbitrary UPL code. This code is run just before actually performing the module’s functionality. This function hook’s main intent is to give you the possibility to programmatically and dynamically set module parameters’ values based on e.g. pipeline variable values (which in turn may have been set through the Simple View or by an external parameter passed to the pipeline). This way, you can set a parameter that does not allow you to have variable references expanded, like popups or check boxes. Additionally, this function serves as a dynamically evaluated condition specifying whether to run the module or not (in contrast to the module’s static Active checkbox).

Example 7.1. 

Assuming you are offering your users the choice between the HTML and CALS table model by way of a pipeline parameter tableType (e.g. in the Simple View), the following code sets the corresponding module parameter TableModel dynamically in the XML Export module. This would not be otherwise possible via that module’s UI since for the selection, a popup is used which has no way to calculate its value based on pipeline variables.

The code assumes that the pipeline parameter tableType can have one of two values: html or cals.

#namespace module "http://www.infinity-loop.de/namespace/upcast-realm/module";
#namespace pipeline "http://www.infinity-loop.de/namespace/upcast-realm/pipeline";
function initialize() as Id {
  $module:TableModel := $pipeline:tableType;
  return EXECUTE; /* run the module */
}

Name

InitializationCode

Java symbol

kInitializationCodeParamName

Type

String

Value

UPL source code

Finalization

This parameter lets you specify the condition under which further pipeline execution should be cancelled after running this module.

This parameter will only be evaluated (and therefore have any effect) if the module action was actually performed, or in other words: if initialize() did not prevent the execution of the module’s action by returning TERMINATE or SKIP.

Normally, pipeline execution continues with the following defined modules even if in the current one there was a warning or error. These messages are collected and then displayed in the final pipeline execution error dialog. However, sometimes this is not a desired behaviour. Specifically, when subsequent modules rely on the proper execution of their predecessors to produce usable or correct results or – even more importantly – to not cause harm to data integrity, it may be necessary to immediately stop further execution of the pipeline when some module produces an error.

You can specify the termination behaviour by using several pre-defined, often used conditions, or you even can specify a custom condition in UPL:

continue

continue pipeline execution no matter what, i.e. even when an ERROR or FATAL error has occurred

signal on FATAL

terminate pipeline execution when during execution of this module, a FATAL error message has been generated

signal on ERROR

terminate pipeline execution when during execution of this module, a FATAL or ERROR error message has been generated. This is the default value for new module instances.

signal on WARN

terminate pipeline execution when during execution of this module, a FATAL, ERROR or WARN message has been generated

custom finalization:

this option lets you specify custom UPL function code which, by returning one of the two Id values TERMINATE or CONTINUE, can request pipeline termination or continuation after this module.

To edit the UPL code for the custom finalize() function, click the Edit finalization code… button. By returning the Id TERMINATE, you can request pipeline termination, and by returning CONTINUE as result you can request pipeline continuation.

The custom function takes as Id parameter the termination status of its child component if there is any, CONTINUE otherwise.

Example 7.2. Finalization function template

function finalize( $childFinalizationResult as Id ) as Id {
  variable $result as Id := $childFinalizationResult; // default: CONTINUE
  /* Return the Id TERMINATE when you want to terminate the pipeline, CONTINUE otherwise. */
  return $result;
}

Name

FinalizationMode

Java symbol

kFinalizationModeParamName

Type

String

Value

continue | signal-fatal | signal-error | signal-warning | custom

Name

FinalizationCode

Java symbol

kFinalizationCodeParamName

Type

String

Value

UPL source code

Log filter

Sets the logging threshold for messages that the module accepts from children and produces itself (see the logging architecture description for details).

Some default filter expressions are available from the associated popup menu, however you are free to define your own customized filter expressions. The filter expression syntax is described here.

If set to inherit, the logging filter settings are governed by the module’s execution parent’s settings (that is usually the pipeline it is contained in).

Name

LogFilterSpec

Java symbol

kLogFilterSpecParamName

Type

String

Value

inherit | OFF | FATAL | ERROR | WARN | INFO | DEBUG | VERBOSE | DETAIL | TRACE | ALL | logfilterspec

Documentation

This is a free form text field for adding notes or documentation specific to this module instance. You can use HTML tags which are copied verbatim into the generated documentation for the pipeline (via File > Generate Documentation…).

Name

ModuleDocumentation

Java symbol

kModuleDocumentationParamName

Type

String

Value

HTML code (will be copied into generated HTML documentation)

3. Action Settings (module-specific)

Parameters belonging to specific modules are described in that module’s detailed description.

Chapter 8. Modules

This section describes the available modules in more detail, listing available parameters. Filter type identifiers are given in square brackets after the UI module name.

1. Pipeline Variables [pipelinevars]

This module allows you to set some commonly used global variables easily for re-use in subsequent modules. It is therefore most useful as the first module in a pipeline.

You can set the global variables pipeline:SourceFile, pipeline:TemporaryItemsFolder, pipeline:DestinationFolder, pipeline:ImageDestinationFolder and pipeline:DebugFolder.

The effect of using this module in API mode is the same as using UpcastEngine.setPipelineVariable().

All parameters have a type of java.lang.String.

Important

When a field of the pre-defined parameters is left empty, that parameter is not set at all. This allows having this type of module somewhere in the middle of a pipeline and have it only set resp. override certain parameters (either custom parameters or selected pre-defined ones). All parameters with empty values in the list of pre-defined entry fields keep their assigned parameters (or are not created).

This also means that if you want to assign the empty string to some parameter, you can only do so by specifiying it in the Custom pipeline variables field.

Custom pipeline variables

Here, you can specify additional global values for use in subsequent modules. The definitions herein are processed after the fixed global parameters described above are evaluated and set, so you can refer to them using the usual ${pipeline:…} variable reference. A parameter definition must follow this syntax:

varname’:=’ ‘"’ value ‘"’;

Quotes within the variable value must themselves be quoted using the backslash character ‘\’.

Important

The text in this field and any contained variable references are resolved as follows:

  1. Any references to the include realm are resolved.

  2. The individual assignments are parsed according to the above syntax.

  3. Only now, any further references to realms other than include are resolved separately for each value part of an assignment.

This algorithm covers the usual cases where you might want to include constant assignment code shared by several pipelines using an include variable reference, but also reference variable values without having to worry whether their actual value breaks the value assignment syntax described above.

Name

PipelineVariables

Java symbol

kPipelineVariablesParamName

Type

String

Value

(string in same syntax as in corresponding UI field)

2. RTF Importer (upCast) [rtfimport]

Note

This module requires an appropriate RTF Importer feature included in your license to be fully functional.

This importer module handles conversion from RTF to the internal, unified upCast format, the upCast Internal DTD. With WordLink enabled, the filter also can convert Word binary files (*.doc).

Optional hyphen RTF symbol, Soft Hyphen (U+00AD) character

The RTF importer outputs the RTF Optional Hyphen symbol (\-) as codepoint U+E003 in the Unicode Private Use Area. This is to allow following pipeline steps to discriminate it from Soft Hyphen (U+00AD) Unicode characters entered directly in the RTF as Unicode. This has been implemented because rendering behaviour of the two in following rendering engines is different from Word’s display so that it is important to be able to differentiate between those two.

However, the Unicode Translation Map in effect in the XML Exporter module maps U+E003 to U+00AD by default. If you need or want to change the translation of RTF’s Optional Hyphen symbol to something other than the Soft Hyphen character in Unicode, you must change or override the default mapping of the source codepoint U+E003 in the XML Exporter module.

Parameters are grouped logically into tabs:

General

Source file

Specify the source file in RTF or, if WordLink is available, in Word binary format that should be imported.

Name

SourceFile

Java symbol

kSourceFileParamName

Type

Object

Value

absolute file path in URL or local file system convention

Hoist common inline properties to parent

If enabled, any inline formatting CSS property that extends and has the same value over all children of a paragraph-level element will be hoisted to its parent object as a style override. Effectively we’re making use of CSS inheritance and optimize the output by specifying that particular property only once on the parent instead of on each of its child elements.

Name

HoistCommonInlines

Java symbol

kHoistCommonInlinesParamName

Type

Bool

Value

true | false

Remove empty inlines

If enabled, any inline style specifications that do not contain any #PCDATA or similar, visually rendered content, are discarded from the document.

The default for this parameter is off based on the assumption that you may want to keep e.g. formatting information for empty cells so that a user may later fill in text and has the correct, originally intended formatting information available at that document location.

Name

RemoveEmptyInlines

Java symbol

kRemoveEmptyInlinesParamName

Type

Bool

Value

true | false

Allow ‘class’ and ‘style’ attributes simultaneously on <inline> elements

When on, this option allows that both a class and style attribute may be present on an element. Otherwise, the two are separated and an anonymous inline element is created for the style attribute instead.

Option checked:

This is <uci:inline uci:class="slang" uci:style="color: blue;">True Blue</uci:inline>.

Option unchecked:

This is <uci:inline uci:class="slang"><uci:inline uci:style="color: blue;">True Blue</uci:inline></uci:inline>.

You might want to use this option to have named Word styles always separated out in a dedicated element so that additional override styles can be recognized quickly by the additional inline element.

Name

CombineWithLogicalStyle

Java symbol

kCombineWithLogicalStyleParamName

Type

Bool

Value

true | false

Markup revision tracking using <inserted> and <deleted>

When this is checked, document revisions are marked up in the result using the inserted and deleted elements.

If this is off, only the result of the revisions will be exported, i.e. inserted content remains in the document and deleted content is removed.

Name

RevisionTracking

Java symbol

kRevisionTrackingParamName

Type

Bool

Value

true | false

Use CSS for forced pagebreaks (where possible)

When checked, the importer tries to use CSS code for specifying forced pagebreaks wherever possible by using the pagebreak-before: always property/value combination.

If this is off, a pagebreak element will always be used.

Name

UseCSSForPagebreaks

Java symbol

kUseCSSForPagebreaksParamName

Type

Bool

Value

true | false

Apply list structuring heuristics

If checked, special list structure detection algorithms are performed to create the best logically structured XML output. If unchecked, Word’s internal list IDs are used to track where a list starts and ends and where a new one begins, which may (based on the editing history of a particular list) have virtually no resemblance to what you are actually seeing in the layout.

The default value is on.

Name

ApplyListHeuristics

Java symbol

kApplyListHeuristicsParamName

Type

Bool

Value

true | false

Flatten list structures

If checked, list/item element structures are not created. Instead, all elements that would constitute list item contents get the respective info attached as attributes:

uci:list-firstitem

'true' when that element is part of the first list item of the respective list, 'false' if it is in a subsequent item (not the first one)

uci:list-hasmarker

'true' when this is the first element of a list item (the one that has the list marker before it), 'false' for any subsequent elements in a certain list item

uci:numberingtext

(on the first element of a list item only) the numbering text (effective marker text) of that list item

Additionally, the following list-specific properties are copied onto each of the (virtual) list item's content elements:

  • list-style-type

  • -ilx-list-level

  • -ilx-list-group

  • -ilx-list-numbering-absolute

  • -ilx-marker-align

  • -ilx-marker-follow

  • -ilx-marker-offset

  • -ilx-marker-format

  • -ilx-marker-font-family

  • -ilx-marker-font-size

  • -ilx-marker-color

The default value of this parameter is off.

Name

FlattenListStructures

Java symbol

kFlattenListStructuresParamName

Type

Bool

Value

Flatten options

Here you can control some additional aspects of list flattening:

serialize markers

when checked, the actual list marker text for each item is inserted as first child node in any element that has the uci:list-hasmarker attribute set to 'true', i.e. any element that is first in a list item. The default value is off.

Default font size

Some RTF documents do not specify a default font size for their text content, but rely on the default of the rendering application (like Microsoft Word). This parameter lets you set the default font size for such documents.

Microsoft Word applications up to and including Word 97 used a default value of 10pt, Word 2000 and later use a default of 12pt. When you set this parameter to * (i.e. automatic), upCast tries to guess from the RTF symbols it finds in the document whether it is a Word 2000 (or later) document and then will use 12pt as default font size, 10pt otherwise.

Name

DefaultFontSize

Java symbol

kDefaultFontSizeParamName

Type

String

Value

'*' | 1..999

Note handling options: Marker mismatch handling

This option lets you specify what should happen when the reference marker to a footnote, endnote or annotation in the body text of the document does not match the marker within the footnote, endnote or annotation definition itself:

do nothing

the mismatch is silently accepted

issue warning

a warning for the mismatch is issued (symbolic name for the warning is NoteMarkerMismatch)

issue error

an error for the mismatch is issued (symbolic name for the error is NoteMarkerMismatch) ; this is the default setting.

The RTF Importer uses pre-defined Word character styles ("footnote reference", "endnote reference", "annotation reference") to find the marker information in a footnote, endnote or annotation definition, respectively. In normal usage patterns of the Word application, the setting of these styles is automatic and correct. However, when markers are edited manually in a footnote definition (and not paying attention), the marker-identifying style may be inadvertently applied also to some or all of the footnote contents. Since upCast tries to automatically remove the marker portion from the footnote definition (since that info is usually (re-) created at the final output/rendering stage), a wrong setting of the respective character styles may lead to actual footnote content getting lost. Such authoring mistakes are usually detected by checking if the reference marker text in the document body is the same as the marker in the definition that the RTF Importer detects based on the special character style settings. Issuing an error when that text is different alerts the user that there's probably something wrong with the document that needs checking and/or fixing in order to not lose any content.

Name

NoteMarkerValidation

Java symbol

kNoteMarkerValidationParamName

Type

String

Value

none | warn | error

Note handling options: Marker in note definition

This option lets you specify what should happen with any (repeated) marker text in the actual footnote definition:

remove based on note reference character style

The RTF Importer uses pre-defined Word character styles ("footnote reference", "endnote reference", "annotation reference") to find the marker information in a footnote, endnote or annotation definition, respectively. When this option is set, any leading contents in a footnote definition that has one of the mentioned styles is removed from the note definition content. This is the default setting.

remove based on content of reference

If the note body (uci:content) starts with the same text as is present in the note reference (uci:reference), that part is removed from the beginning of the footnote body content. Differences in style and leading and trailing whitespace are not considered and removed as well. Automatic note numbering placeholders are considered as expected during this text prefix comparison.

include unchanged

the footnote, endnote or annotation content is kept unmodified (i.e. as present in the Word document) in the XML output, including any contained repetition of the note's reference marker

Name

NoteMarkerHandling

Java symbol

kNoteMarkerHandlingParamName

Type

String

Value

remove-style | remove-content | include

Literal pass-through styles

If checked, you can specify a set of (Word-) styles, separately for the paragraph style and the character style category, by specifying their exact names which should be treated as literals. This means that all text in the document set using these styles will be written to the output without any interpretation by upCast. This lets you write e.g. XHTML or XML code directly within your document the way it should appear at that location in the output.

Name

LiteralProcessing

Java symbol

kLiteralProcessingParamName

Type

Bool

Value

true | false

Paragraph style names

When Literal pass-through styles is on, specify here the list of paragraph styles that should be treated as literal content indicators. Enclose the names of the styles in double-quotes, separate styles by a space character.

Name

LiteralParStyle

Java symbol

kLiteralParagraphStyleParamName

Type

String

Value

style name

Character style names

When Literal pass-through styles is on, specify here the list of character styles that should be treated as literal content indicators. Enclose the names of the styles in double-quotes, separate styles by a space character.

Name

LiteralCharStyle

Java symbol

kLiteralCharacterStyleParamName

Type

String

Value

style name

Images

Include images

When checked, images contained in the document are processed as configured by the image processing parameters. If unchecked, all images of the source document will be completely discarded from the document.

Name

IncludeImages

Java symbol

kIncludeImagesParamName

Type

Bool

Value

true | false

Temporary images folder

This is the location where images in a read document will be temporarily stored while the pipeline is processed. It is e.g. the responsibility of an exporter to copy images intended to be permanently saved across a pipeline run to a different location.

Note

A pipeline keeps track of temporary images created in the above location. After finishing a pipeline run, all these recorded temporary files are automatically deleted.

Name

TemporaryItemsFolder

Java symbol

kTemporaryItemsFolderParamName

Type

String

Value

path to temporary items folder

Use inline copies instead of referenced original images (if available)

When this option is checked, for images that have been included in the RTF document using both methods, by reference and by embedding, the module will try to use the embedded substitute representation. This option essentially breaks the link to the original image file, if a substitute representation has been embedded in the RTF file, and instead links to the embedded representation of the original file.

When an image has only been linked and no substitute representation is available in the RTF, however, the original link to the image is preserved and used.

Name

InlineReferencedImages

Java symbol

kPreferEmbeddedImagesParamName

Type

Bool

Value

true | false

Incoming images default resolution

This parameter determines the image resolution in dpi (dots per inch) to use for embedded images that do not specify their resolution explicitly. This is true for all (originally) GIF images and some variants of JPEG and PNG images.

Without any dpi information, the RTF importer (and, as a matter of fact, even Word) cannot determine the absolute size of images, which is necessary to create a fully specified export file. This parameter is then used to establish a default dpi value and corresponds roughly to Word’s Web Options > Image resolution setting.

When setting this to the default ‘*’ value, the RTF importer determines the absolute size of the image from the image properties in the RTF document (if available) and modifies the embedded image data by adding the resolution determined from the (absolute size/number of pixels)-pair to the externalized image. This ensures that subsequent processors can correctly determine absolute sizes and scale any images accordingly.

Tip

If you have control over the original document generation process and especially image creation, make sure that each image you add to a Word or RTF document contains explicit resolution information, as this avoids all sorts of platform incompatibilities.

This rule especially forbids importing GIF images as the GIF format does not include resolution information. However, also several Clip Art images in JPEG and PNG format do not contain this desirable information, with displayed image size in a document becoming dependent on platform, Word version or setting of the Web Options > Image resolution parameter – which is generally undesirable.

Outgoing images rendering resolution

This value affects the WMF to pixmap renderer built into the RTF Importer. This means that WMF (or EMF) images will be rendered into a pixmap with pixel dimensions for width and height that correspond to this value.

The default value is 96 dpi (used e.g. by Microsoft’s Internet Explorer™). You may want to change this when outputting for Netscape Navigator 4.7 on the Mac, which by default displays at 72 dpi and therefore would downscale images written using 96 dpi resolution.

Suppose you have a WMF image in your document that is 2 by 1 inches in size. With 96 dpi output resolution, this will yield a pixmap of size 192 by 96 pixels.

However, if you set the output resolution to only 72 dpi, the resulting pixmap will be 144 by 72 pixels in size.

Name

ImageRenderingResolution

Java symbol

kImageRenderingResolutionParamName

Type

Integer

Value

20..360

Export embedded images of type…

While exporting embedded images, you have the option to convert them to a different format.

The RTF Importer includes a custom WMF to pixmap renderer fully programmed in Java. It is neither intended nor recommended for production quality image conversion! To perform high-quality image conversion, we strongly encourage you to consider specialized third-party products. Nevertheless, the built-in renderer is useful and intended for producing draft image renderings for viewing in a web browser or creating documents for editorial review and should perform well enough for most purposes except final publishing.

Embedded images in an RTF document can be of several image format types: WMF, EMF, JPEG, PNG and Macintosh PICT. The RTF importer lets you specify a handling method for each of these formats, so you can e.g. use already pixel based images like JPEG or PNG unchanged while rendering vector formats like WMF into a pixel-based representation.

The following handling methods are available (some of which are not applicable to all source formats):

(no change)

Export the embedded image as binary data without any modification applied

external cmd:

Export the embedded image as binary data without any modification applied, and then run the specified external command on it for further processing. (See below for details.)

*remove*

The image will be completely removed from the document

JPEG

The image will be converted into JPEG format, using the built-in WMF to pixmap renderer if necessary. Clicking Options… lets you set the JPEG compression quality.

PNG

The image will be converted into PNG format, using the built-in WMF to pixmap renderer if necessary. Clicking Options… lets you set the PNG compression algorithm.

BMP

The image will be converted into Windows bitmap (BMP) format, using the built-in WMF to pixmap renderer if necessary.

PICT

The image will be converted into Macintosh PICT format, using the built-in WMF to pixmap renderer if necessary. Note that only the image map operator is supported. The RTF importer will not translate WMF vector operators into native PICT operators.

When using the option external cmd, two additional parameters can be set:

File extension

The field should receive the destination file extension of the image file as it is after the external conversion. For example, if you want to convert a WMF file to TIFF, the extension should be tif or tiff.

Command

This is the external command to execute for converting the image source file to the desired target format. You must use placeholders for the source and destination file name using the upCast variable syntax. The variables to use are:

${imgsrc#local}

the image source file in local file name convention

${imgsrc#url}

the image source file in URL format

${imgdest#local}

the destination file name in local file name convention

${imgdest#url}

the destination file name in URL format

This works as follows: The file to be converted is available at the location in imgsrc#local. The RTF importer then constructs a target file name, using the source file name as basis, but setting the extension to the one specified. Since the RTF importer needs to know the final resulting filename for referring to the externally converted image in the internal document tree, but there is no way to return a string from a shell command easily (just an integer return code), it prescribes the target file name itself. This is what the variable imgdest#local is for. You must make sure that the final, processed image file is available at the location contained in that specific variable.

Example 8.1. Example:

To convert a WMF file to JPEG, use settings like:

WMF to [external cmd:]

File extension: [jpg]

Command: [fileconverter -fmt jpeg -outfile ${imgdest#local} ${imgsrc#local}]


Name

WMFDestFormat

Java symbol

kWMFDestFormatParamName

Type

String

Value

unchanged | dispose | ExternalCommand | JPEG | PNG | BMP | PICT

Name

EMFDestFormat

Java symbol

kEMFDestFormatParamName

Type

String

Value

unchanged | dispose | UseWMFSubstitute | ExternalCommand

Name

JPEGDestFormat

Java symbol

kJPEGDestFormatParamName

Type

String

Value

unchanged | dispose | ExternalCommand | JPEG | PNG | BMP | PICT

Name

PNGDestFormat

Java symbol

kPNGDestFormatParamName

Type

String

Value

unchanged | dispose | ExternalCommand | JPEG | PNG | BMP | PICT

Name

PICTDestFormat

Java symbol

kPICTDestFormatParamName

Type

String

Value

unchanged | dispose | ExternalCommand | JPEG | PNG | BMP | PICT

Name

WMFDest.JPEG.Quality

Java symbol

kWMFDestJPEGQualityParamName

Type

Integer

Value

0..100

Name

WMFDest.PNG.CompressionType

Java symbol

kWMFDestPNGCompressionTypeParamName

Type

String

Value

default | fast | max | none

Name

JPEGDest.JPEG.Quality

Java symbol

kJPEGDestJPEGQualityParamName

Type

Integer

Value

0..100

Name

JPEGDest.PNG.CompressionType

Java symbol

kJPEGDestPNGCompressionTypeParamName

Type

String

Value

default | fast | max | none

Name

PNGDest.JPEG.Quality

Java symbol

kPNGDestJPEGQualityParamName

Type

Integer

Value

0..100

Name

PNGDest.PNG.CompressionType

Java symbol

kPNGDestPNGCompressionTypeParamName

Type

String

Value

default | fast | max | none

Name

PICTDest.JPEG.Quality

Java symbol

kPICTDestJPEGQualityParamName

Type

Integer

Value

0..100

Name

PICTDest.PNG.CompressionType

Java symbol

kPICTDestPNGCompressionTypeParamName

Type

String

Value

default | fast | max | none

Objects

These parameters specify how embedded objects (OLE) should be handled. The RTF importer generates an uci:object element for each embedded object it finds in the RTF. The child elements of this container object are alternative representations of the object’s data. This can can be an uci:image (if available in the source document: represents the current display of that object at the time of saving the document), or an uci:ole element (if available: it contains a base64 representation of the binary data of the OLE object, which makes it possible to reconstruct it to an editable instance using the RTF exporter).

Include image representation

When checked, an image representation alternative will be added to the object element (if available in the source document).

Embed binary OLE object data as base64-encoded text

When checked, an uci:ole binary data representation alternative will be added to the object element. The uci:ole element contains the base64-encoded binary data as character data.

Extract object from OLE object and serialize to file

When checked, an uci:extobject element within the uci:object element is written with the following attributes:

uci:rawtarget

the raw target specification of the externalized file

xlink:href

the URL (possibly relative) to the externalized file

uci:mimetype

the MIME type of the externalized object file

Currently, the following OLE CLSIDs are supported for externalization:

AcroExch.Document.7

serialization to PDF; the mime type used is application/pdf

ExcelSheet.8

serialization to *.xls; the mime type used is application/vnd.msexcel

If the OLE object is of some type that is not explicitly supported, a warning is issued to the logging system and the object (unwrapped from the embedded OLE object in the Word document) is written to the file. Note that in most cases, you will not be able to use that file with the application that originally created the OLE, as the data structure of such objects is proprietary to that application and extracting the correct portion of the data from the OLE object requires knowledge of that particular application's OLE file format.

Note

To see a yet unsupported OLE format (CSLID) supported for externalization, please contact us at support@infinity-loop.de. Please do include a small sample file that includes an instance of an OLE of that particular application, and exactly specify which application at which version you were using for creating that OLE object.

Include MathML representation

When MathLink is available, i.e. you have Design Science‘s MathType software (version 5.2) installed on your Windows system and are running upCast on that same machine, for MathType OLEs, you can also embed a MathML representation of your formula in the object element as m:math element.

Important

Since MathLink is only available on the Windows platform, this option will only be enabled when a functioning MathLink actually is available to the application.

Name

ObjectHandling

Java symbol

kObjectHandlingParamName

Type

String

Value

image || embed || mathml || extract (separated by whitespace if more than one)

WordLink

Set WordLink features.

Important

Since WordLink is only available on the Windows platform, this tab will only be displayed when WordLink actually is available to the application.

When opening a pipeline definition file created on the Windows platform on some other platform, existing settings will be preserved on save, but will have no effect during execution on that non-Windows platform.

Mode

When Process .doc files only is selected, WordLink and all options specified will only be applied to Word binary (*.doc) files.

When Process all files is selected, WordLink and all options specified will be applied to any input document, i.e. even files that are in RTF format already. This lets you automatically update fields or add pagestart and linestart elements.

Name

WordLinkMode

Java symbol

kWordLinkModeParamName

Type

String

Value

doc | all

Run macro named „il_premacro"

When checked, WordLink will first run a Word macro named il_premacro on the source document. This macro must either be defined in the respective document (when it is a Word binary .doc file) or in the global document template file (*.dot).

When this macro is not available, an error will be issued after conversion, though the further conversion process is not affected.

Update fields

When checked, WordLink will update any fields in the source document with current values: date, time, pages, …

Update from linked images

When including an image only by reference (i.e., using Word’s INCLUDEPICTURE field), the RTF importer is not able to determine the actual image size as that information is not part of RTF. By checking this option, the linked image is temporarily included into the document with the effect that image size and possibly applied scaling in the .doc Word binary file can be evaluated by the importer.

This feature is not beneficial for RTF source files, as in these the necessary information is already lost (also for Word).

Mark up layout page breaks using <pagestart />

This inserts a <pagestart /> empty inline element at those places where in current layout flow, there would be a dynamic page break when rendering the document.

Mark up layout line breaks using <linestart />

This inserts a <linestart /> empty inline element at those places where in current layout flow, there would be a dynamic line break when rendering the document.

Important

This is slow for documents bigger than about 100 pages. You may want to increase the Kill timeout value significantly. Also, some document structure constellations may yield wrong line break position results due to limitations in the Word application.

Name

WordLinkCommand

Java symbol

kWordLinkCommandParamName

Type

String

Value

Pages || Update || Premacro || Lines || Includelinkedimages || Updatelinks (concatenate desired options without any whitespace inbetween)

Kill timeout

When hitting a corrupt document, WordLink may have problems and/or hang the application. Therefore, you can set a kill timeout value after which the WordLink functions will be aborted. The default value is 300 seconds.

Note

Killing WordLink may leave an invisible instance of Word running. Please check in case of a timeout running processes and kill any zombie Word processes manually using the Process Viewer (Ctrl-Alt-Del on Windows 2000/XP).

Name

WordLinkKillTimeout

Java symbol

kWordLinkKillTimeoutParamName

Type

Integer

Value

timeout duration in milliseconds

Copy temporary .rtf file to debug folder as "basename-tmp.rtf"

This is mainly for debugging purposes. It copies the intermediate RTF file to the specified debug folder with a name of basename-tmp.rtf after having applied all WordLink functions. This is the file that the RTF importer itself takes as source for its actual conversion process.

Name

WordLinkCopyToOutput

Java symbol

kWordLinkCopyToOutputParamName

Type

Bool

Value

true | false

3. UPL Processor [uplcode]

Note

This module requires an appropriate UPL feature included in your license to be fully functional.

This module lets you run a program written in the Upcast Processing Language (UPL).

The single context node for all UPL code in this module is the document root node (XPath: / ). Note that this is different from the document root element!

UPL code

This contains the UPL code you want to execute. The code must define a function main() as follows:

function main() as Value {
   ... your code goes here ...
}

The UPL Processor calls this function main() once when it runs and executes the code defined therein (or in any dependent, user-defined functions). For a detailed description of UPL, see the separate documentation, Upcast Processing Language.

The returned result of the function is stored into the pipeline variable ModuleResult.

Name

UPLCode

Java symbol

kUPLCodeParamName

Type

String

Value

UPL source code

UPL parameters

This contains the assignments for variables to be passed to the UPL program. For a detailed description of how UPL receives parameter values, see the separate documentation, Upcast Processing Language.

A parameter definition must follow this syntax:

paramname ':=' '"' value '";'

Quotes within the parameter value must themselves be quoted using the backslash character ‘\’.

Important

The text in this field and any contained variable references are resolved as follows:

  1. Any references to the include realm are resolved.

  2. The individual assignments are parsed according to the above syntax.

  3. Only now, any further references to realms other than include are resolved separately for each parameter assignment’s value part of a parameter assignment.

This algorithm covers the usual cases where you might want to include constant parameter assignment code shared by several UPL modules in a pipeline using an include variable reference, but also reference variable values without having to worry whether their actual value breaks the value assignment syntax described above.

Name

UPLParameters

Java symbol

kUPLParametersParamName

Type

String

Value

(string in same format as in UI)

4. UPL Tree-Processor [upl]

Note

This module requires an appropriate UPL feature included in your license to be fully functional.

This module differs from the UPL Processor in that it does not call a single function once, but you can define code to be run upon visiting each node of the current internal document in a depth-first traversal, depending on certain conditions you specify.

UPL code

This contains the UPL code you want to execute. For a detailed description of the UPL, see the separate documentation, Upcast Processing Language.

Name

UPLCode

Java symbol

kUPLCodeParamName

Type

String

Value

UPL source code

UPL parameters

This contains the assignments for variables to be passed to the UPL program. For a detailed description of how UPL receives parameter values, see the separate documentation, Upcast Processing Language.

A parameter definition must follow this syntax:

paramname ‘:=’ ‘"’ value ‘"’;

Quotes within the parameter value must themselves be quoted using the backslash character ‘\’.

Important

The text in this field and any contained variable references are resolved as follows:

  1. Any references to the include realm are resolved.

  2. The individual assignments are parsed according to the above syntax.

  3. Only now, any further references to realms other than include are resolved separately for each parameter assignment’s value part of a parameter assignment.

This algorithm covers the usual cases where you might want to include constant parameter assignment code shared by several UPL modules in a pipeline using an include variable reference, but also reference variable values without having to worry whether their actual value breaks the value assignment syntax described above.

Name

UPLParameters

Java symbol

kUPLParametersParamName

Type

String

Value

(string in same format as in UI)

Grouper

When turned on, the grouping algorithm will be run on the internal tree. This will be before the finalize() or finalize-error() UPL method is called.

Name

RunGrouper

Java symbol

kRunGrouperParamName

Type

Bool

Value

Grouping processing order

This parameter lets you set the order of the colors in which the grouping should be performed.

With all colors in alphabetical order, all colors that have been used for setting painters or tags are grouped in alphabetical order of the color name. The ordering is the same as the Java class java.lang.TreeSet uses by default on the platform you are running upCast on.

With colors in specified order, then all remaining:, you can specify a list of colors you want to be grouped in a given order before all others in the text field below, with any remaining ones being grouped in alphabetical order (see all colors in alphabetical order) afterwards.

With only these colors in specified order, you can specify the colors and their order of grouping in the text field below. No other colors will be grouped, even if respective painters or tags have been placed on nodes in the internal document tree.

After a run of the grouper, all painters, tags and paintings of nodes are removed from the internal tree. This means that any further grouper instances running after a grouper has already run will have no effect unless new painters and tags have been placed on nodes of the internal document tree, usually using the UPL processor.

Color order is specified by listing the colors in sequence, separated by whitespace. If a color name includes whitespace (which is deprecated), the full color name must be enclosed in double quotes.

Name

GroupingColorOrder

Java symbol

kGroupingColorOrderParamName

Type

String

Value

alphabetic | only | first

Name

GroupingColors

Java symbol

kGroupingColorsParamName

Type

String

Value

ordered list of color names, separated by whitespace; to use color names that themself contain whitespace, surround them by double-quotes

Element Splitter

When turned on, any split actions attched to nodes using mark-split() in the internal tree will be executed. This will be after running the grouper (if enabled), but before the finalize() or finalize-error() UPL method will be called.

Name

RunSplitter

Java symbol

kRunSplitterParamName

Type

Bool

Value

5. Sectioner [sectioner]

The Sectioner module is used for creating a nested, deeper structure based specifically on elements that have a heading level set (via set-heading-level() in UPL), and uci:part elements that have the grouping property set (via set-grouping() in UPL).

Sectioning works only on the direct children of the uci:body element in the upCast Internal DTD.

5.1. Handling of uci:part elements

If the algorithm finds a uci:part element, it checks its grouping property. If this uci:part is a grouping part, all uci:body element children between this uci:part element and the next uci:part element that has the grouping property set will be surrounded by this uci:part element.

Example 8.2. Example:

…
<part is-grouping="true"/>
<par>…</par>
<par>…</par>
<part is-grouping="false"/>
<par>…</par>
<part is-grouping="true"/>
<par>…<par>
…

will be transformed by a run of the Sectioner into

…
<part is-grouping="true">
    <par>…</par>
    <par>…</par>
    <part is-grouping="false"/>
    <par>…</par>
</part>
<part is-grouping="true">
    <par>…<par>
…

Note that namespace prefixes/definitions have been omitted in the above for better readability.


<part> is grouping (by default)

When checked, even though you may not have specified this explicitly on each uci:part element (e.g. in UPL), all uci:part elements are treated as if they had set the grouping property by default. This mimics the behavior of pre-6.0 versions of upCast.

Name

PartIsGrouping

Java symbol

kPartIsGroupingParamName

Type

Bool

Value

true | false

5.2. Handling of other elements

Any elements that have a uci:heading-level attribute with a value greater than 0 are considered headings of the respective structure level. The Sectioner creates sections based on the heading level information on those elements by automatically creating a surrounding uci:section element, taking care to match the section nesting to the element’s heading level. This means that if there is a jump in heading level, the Sectioner will automatically generate additional, grouping uci:section elements.

When an element with the same heading level is found as the current section nesting, the current section is closed and a new one is opened at the same level.

When an element with a higher heading level than the current one is encountered, a new, nested section is created within the current section.

When an element with a lower heading level than the current one is encountered, the appropriate number of open, nested sections is closed (including the one with the same nesting level) and a new one is opened.

Here’s an example demonstrating all possible cases (assume all elements and attributes being in the uci namespace):

Example 8.3. Example: section nesting based on paragraph’s heading level

<par>…</par> 
<par heading-level="1">…</par>
<par>…</par>
<par heading-level="2">…</par>
<par>…</par> 
<par heading-level="4">…</par>
<par>…</par> 
<par heading-level="3">…</par>
<par>…</par> 
<par heading-level="1">…</par>
<par>…</par>

will result in the following structure generated:

<par>…</par>
<section level="1">
  <par heading-level="1">…</par>
  <par>…</par>
  <section level="2">
    <par heading-level="2">…</par>
    <par>…</par>
    <section level="3">
      <section level="4">
        <par heading-level="4">…</par>
        <par>…</par> 
      </section>
    </section>
    <section level="3">
      <par heading-level="3">…</par>
      <par>…</par> 
    </section>
  </section>
</section>
<section level="1">
  <par heading-level="1">…</par>
  <par>…</par>
</section>

Note that namespace prefixes/definitions have been omitted in the above for better readability.


The sectioning algorithm can be modified by two options:

Create <section> for empty headings

The default sectioning algorithm only creates a new section for the first of consecutive elements having a uci:heading-level attribute of the same value (if it is not empty).

The idea behind this option is that the user may have created a heading in Word, then hit return (not changing the style) to create visual space, and only then started writing the actual content. You certainly would not want to have a section on its own for each of the visual space generating empty heading-styled paragraphs, but only for the first one, so section nesting generation is suppressed for the remaining heading-styled paragraphs.

If, however, you want to create section nesting corresponding to each heading-styled paragraph in a document, even if it’s empty, check this option.

Name

GroupEmptyHeadings

Java symbol

kGroupEmptyHeadingsParamName

Type

Bool

Value

true | false

Create <sectionintro> around leading section content

Sections created using the sectioning algorithm may have leading content before any subsections they may also have. Checking this option allows you to have this leading content up to the start of the first nested (sub-) section be grouped by an uci:sectionintro element , e.g. for easier post-processing later with XSLT.

You can choose whether you want the uci:sectionintro element be created in any case (always) or only when the respective uci:section actually has sub-sections (when sub-sections exist).

In this example, assume all elements and attributes being in the uci namespace:

Example 8.4. Grouping the section introduction

<par heading-level="1">…</par>
<par>…</par>
<table>…</table>
<par>…</par>
<par heading-level="2">…</par>
<par>…</par>

will be transformed to the following when Create <sectionintro> around leading section content is checked with the always option:

<section level="1">
  <sectionintro>
    <par heading-level="1">…</par>
    <par>…</par>
    <table>…</table>
    <par>…</par>
  </sectionintro>
  <section level="2">
    <sectionintro>
      <par heading-level="2">…</par>
      <par>…</par>
    </sectionintro>
  </section>
</section>

or it will be transformed to the following when Create <sectionintro> around leading section content is checked with the when sub-sections exist option:

<section level="1">
  <sectionintro>
    <par heading-level="1">…</par>
    <par>…</par>
    <table>…</table>
    <par>…</par>
  </sectionintro>
  <section level="2">
    <par heading-level="2">…</par>
    <par>…</par>
  </section>
</section>

Note that namespace prefixes/definitions have been omitted in the above for better readability.


Name

GroupSectionIntro

Java symbol

kGroupSectionIntroParamName

Type

String

Value

never | always | child

6. [DEPRECATED] Grouper [grouper]

Important

This module is deprecated and must no longer be used in new development of processing pipelines. It will be removed completely in a future version of upCast. Update any of your existing pipeline definitions as soon as possible by transitioning to the use of the functionally equivalent Grouper option of the UPL Tree Processor module.

The Grouper module actually performs a grouping that has been earlier specified during a run of an UPL Tree Processor.

Grouping processing order

This parameter lets you set the order of the colors in which the grouping should be performed.

With all colors in alphabetical order, all colors that have been used for setting painters or tags are grouped in alphabetical order of the color name. The ordering is the same as the Java class java.lang.TreeSet uses.

With colors in specified order, then all remaining:, you can specify a list of colors you want to be grouped in a given order before all others in the text field below, with any remaining ones being grouped in alphabetical order (see all colors in alphabetical order) afterwards.

With only these colors in specified order, you can specify the colors and their order of grouping in the text field below. No other colors will be grouped, even if respective painters or tags have been placed on nodes in the internal document tree.

After a run of the grouper, all painters, tags and paintings of nodes are removed from the internal tree. This means that any further grouper instances running after a grouper has already run will have no effect unless new painters and tags have been placed on nodes of the internal document tree, usually using the UPL processor.

Color order is specified by listing the colors in sequence, separated by whitespace. If a color name includes whitespace (which is deprecated), it must be enclosed in double quotes.

Name

GroupingColorOrder

Java symbol

kGroupingColorOrderParamName

Type

String

Value

alphabetic | only | first

Name

GroupingColors

Java symbol

kGroupingColorsParamName

Type

String

Value

ordered list of color names, separated by whitespace; to use color names that themself contain whitespace, surround them by double-quotes

7. XML Importer [xmlimport]

This module imports any XML document into the internal tree variable, replacing any existing document. This is useful when you want to apply some of the specific UPL functions on it and need not rely on styling info (which is currently not imported/recognized and cannot be created within upCast).

Important

Special element: uci:text

There is one element, uci:text, that is handled sepcial on import: It is discarded, and its first Text node child receives the uci:text element's uci:node-id value as the value that will be returned from generate-id() called on it (unless a different node in the partially constructed document tree already carries that id value).

Source File

This parameter lets you choose the source XML file to import.

Name

SourceFile

Java symbol

kSourceFileParamName

Type

Object

Value

absolute file path in URL or local file system convention

Discard ignorable whitespace

If the document to be imported has a DOCTYPE declaration associating a DTD with the document, the parser can identify ignorable whitespace. When this parameter is checked, ignorable whitespace will be discarded from the internal tree.

Name

DiscardIgnorableWhitespace

Java symbol

kDiscardIgnorableWhitespaceParamName

Type

Bool

Value

External DTD subset fallback

This parameter lets you choose a fallback for the external DTD subset for the imported document. This is useful if you want to import an XML document that does not have a DOCTYPE declaration, but you have an XML DTD that you know the imported document must satisfy. Specifying the external DTD subset here (DTD file) allows you to supply that info to the parser, which in turn may use that info to determine which whitespace is ignorable in the imported document.

Essentially, specifying a file here has the same effect as if the imported XML document had a

<!DOCTYPE root SYSTEM "specified-file">

declaration, unless it explicitly specifies a DOCTYPE declaration of its own.

Name

ExternalSubsetLocation

Java symbol

kExternalSubsetLocationParamName

Type

String

Value

Attach locator info

When checked, each element node of the imported document will get attached the following set of attributes:

uci:starttag-start-line

the line where the start tag of this element starts in the imported XML document file (position of the opening '<')

uci:starttag-start-col

the column where the start tag of this element starts in the imported XML document file (position of the opening '<')

uci:starttag-end-line

the line where the start tag of this element ends in the imported XML document file (position of the first character after the closing '>')

uci:starttag-end-col

the line where the start tag of this element ends in the imported XML document file (position of the first character after the closing '>')

uci:endtag-start-line

the line where the end tag of this element starts in the imported XML document file (position of the opening '<')

uci:endtag-start-col

the column where the end tag of this element starts in the imported XML document file (position of the opening '<')

uci:endtag-end-line

the line where the end tag of this element ends in the imported XML document file (position of the first character after the closing '>')

uci:endtag-end-col

the line where the end tag of this element ends in the imported XML document file (position of the first character after the closing '>')

All values are 1-based.

Name

AttachLocatorInfo

Java symbol

kAttachLocatorInfoParamName

Type

Bool

Value

8. XML Exporter [xmlexport]

This module serves for serializing the internal tree to XML. It offers a choice for the table model to write (internal, HTML or CALS), debugging and pretty-printing options. It also offers choices for handling images in the document (separate for referenced/linked images and embedded images) and you can use a Unicode Translation Map.

General

Destination File

Choose the full filename into which the result should be written. You can use upCast’s variables for building the path.

Name

DestinationFile

Java symbol

kDestinationFileParamName

Type

String

Value

absolute path to desired result file

Output resolution

Specify the output resolution in dpi. This value is used for calculating device pixel values, e.g. in HTML tables’ cell widths or images’ sizes.

Name

OutputResolution

Java symbol

kOutputResolutionParamName

Type

Double

Value

1..9999

Output file encoding

Lets you specify the encoding in which the XML file will be written. If your further tool chain allows it, we strongly recommend to use the default, UTF-8.

Name

OutputEncoding

Java symbol

kOutputEncodingParamName

Type

String

Value

Java encoding name

Table model

This parameter lets you choose which table model to use for tables. You can either choose the native (upCast) table model, which is a very simple table > row > cell model, the HTML 4 table model, or the OASIS-EM (CALS) (OASIS XML Exchange Table Model, a subset of CALS) table model.

The HTML 4 table model uses the HTML namespace http://www.w3.org/HTML/1998/html4, the CALS table model uses the special, proprietary namespace http://www.infinity-loop.de/namespace/2006/upcast-cals.

Name

TableModel

Java symbol

kTableModelParamName

Type

String

Value

HTML | CALS | native

Style information

Lets you specify how general CSS styles for known elements and named styles for paragraphs and inline elements should be exported. Options are:

none

No style info is exported at all. This does not effect local styles on elements, which will be written in any case according to the "Explode CSS style info" setting.

internal (<style> element)

The style info is written as CSS code in the special element uci:style (in the upCast internal namespace) in the document’s uci:head element.

external (default file)

Writes a stylesheet processing instruction to point to a CSS file named basename.css in the same folder as the resulting XML file. This file can e.g. be created using the CSS Exporter module.

custom stylesheet PI

this lets you specify a custom stylesheet processing instruction to e.g. link to a general CSS file you wish to use in all of the converted documents.

Name

StylesheetMode

Java symbol

kStylesheetModeParamName

Type

String

Value

none | internal | external | custom

Name

CustomStylesheetPI

Java symbol

kCustomStylesheetPIParamName

Type

String

Value

custom stylesheet PI string

Include generator info as comment

When checked, adds info about when and by which version of upCast the XML file was produced to that file as an XML comment. This may be useful both for infinity-loop support during trouble shooting and for you, when you need to relate some produced XML files to a certain version (in time) of your pipelines.

Name

IncludeGeneratorInfo

Java symbol

kIncludeGeneratorInfoParamName

Type

Bool

Value

true | false

Wrap individual Text nodes with element <uci:text>

When checked, each single Text node in the internal DOM tree will be surrounded by an <uci:text> element on serialization. That element is also the carrier for the uci:node-id attribute for the original Text node's id as obtained from the generate-id() XPath function when called on that node.

Name

MarkTextNodes

Java symbol

kMarkTextNodesParamName

Type

Bool

Value

true | false

Serialize node id as attribute @uci:node-id

When checked, each serialized element automatically receives an additional attribute uci:node-id, holding the value obtained by calling generate-id() on that element.

Name

SerializeNodeId

Java symbol

kSerializeNodeIdParamName

Type

Bool

Value

Pretty-print output

Turns on pretty-printing the output for elements whose whitespace handling mode is known explicitly to the serializer.

Name

PrettyPrint

Java symbol

kPrettyPrintParamName

Type

Bool

Value

true, false

Images

During import, e.g. using the RTF importer, all references to images are made absolute and stored this way in the internal tree as follows:

Embedded images are written to disk into a temporary location and possibly a format conversion is applied. The internal tree at this point holds the absolute path to these temporary image files.

Linked (or referenced) images are stored with their absolute path to the original image; no matching files for linked images are created in the temporary image files location.

At export time, you can decide how the image location information (and possibly the actual image files) should be handled. The handling mode can be set individually for images that were embedded in the original document and (external) images that were only linked to.

Embedded Images

This parameter governs the handling of images that originally had been embedded in the source document.

remove image from Destination File

The uci:image element is completely dropped from the XML output.

copy to Image DestinationFolder (new file)

This option copies the temporary image file to the Image Destination Folder. If a file of the desired name already exists at that location, a unique file name is generated by appending -1, -2 etc. to the basename until a name is found that is not already used. A relative reference to this copy is then used in the uci:image element.

copy to Image DestinationFolder (replacing)

This option copies the temporary image file to the Image Destination Folder. If a file of the desired name already exists at that location, it is overwritten without prompting. A relative reference to this copy is then used in the uci:image element.

internal tree format (don’t copy)

This option writes the absolute path to the temporary file as it is currently set in the internal tree unchanged. This is useful for checking how the internal tree looks like at a certain point in a module chain.

Important

The temporary image files will be deleted automatically after a pipeline execution for a certain document. This means that when using the internal tree format (don’t copy) option, the referenced image in the generated XML will have been deleted!

Name

EmbeddedImagesHandling

Java symbol

kEmbeddedImagesHandlingParamName

Type

String

Value

discard | copy | copyreplace | internal

Referenced images

This parameter governs the handling of linked (referenced) images in the original source file.

remove image from Destination File

The uci:image element is completely dropped from the XML output.

copy original to Image DestinationFolder (new file), update link

This option copies the original, linked-to image file to the Image Destination Folder. If a file of the desired name already exists at that location, a unique file name is generated by appending -1, -2 etc. to the basename until a name is found that is not already used. A relative reference to this copy is then used in the uci:image element.

If the original file is not accessible from the machine that executes the pipeline (be it due to network failure, the file not existing or some other problem), the option update link for Destination File location is used as fallback instead.

copy original to Image DestinationFolder (replacing), update link

This option copies the original, linked-to image file to the Image Destination Folder. If a file of the desired name already exists at that location, it is overwritten without prompting. A relative reference to this copy is then used in the uci:image element.

If the original file is not accessible from the machine that executes the pipeline be it due to network failure, the file not existing or some other problem), the option update link for Destination File location is used as fallback instead.

keep verbatim link to original

This option writes the reference the same way it was found in the original source document. This therefore may be an absolute or relative path.

Note that if the original location specification in the source RTF file was relative, but the XML file is not saved into the same folder as where the source document is located, chances are that the link is broken.

update link for Destination File location

This option updates the reference to the original image in a way that it still points to that very image, even when the destination of the XML file is in a different folder (when the original reference was relative).

Name

LinkedImagesHandling

Java symbol

kLinkedImagesHandlingParamName

Type

String

Value

discard, copy, copyreplace, keep, update

Image Destination Folder

You can specify a separate folder dedicated for images. By default, this is set to ${module:DestinationFile#urlpath}, which evaluates to the same folder where the XML file is saved. However, if you want to put images into a separate folder, you can do this here. This is the folder where any of the above options that physically copy the image file will place the file. Any relative references to the image from within the XML file will be adjusted accordingly.

Name

ImageDestinationFolder

Java symbol

kImageDestinationFolderParamName

Type

String

Value

absolute path to folder

Filter

The nodes in the internal tree may have a very rich set of attributes attached, many of which have only been useful while processing the tree within upCast, e.g. with UPL. Serializing all those attributes may create huge files, where only a fraction of the info contained will be used down the further processing chain of the document. To reduce unnecessary memory consumption and processing time, the XML Exporter offers a way to set up a filter on the attributes serialized for each internal tree node. This is achieved by using a specially formed UPL program in conjunction with the dedicated filtering function filter-attrs().

Tip

This filter can be effectively used to reduce the set of CSS properties exploded into attributes to a minimal set that you are actually interested in for further processing, e.g. in an XSLT step.

Attribute Filter

This field holds the UPL program to perform the filtering.

As in the UPL Tree-Processor, you can define several UPL rules. The selector part determines for which kind of node (and possibly more complex conditions) the attribute filter applies. This lets you filter attributes differently on different elements.

The action part is applied, when the selector matches. Although theoretically, you can use the complete range of UPL functionality on such a node, many changes to the node will not be picked up by the serialiazer (except for changes in the node’s attributes), so we recommend against using this UPL program for other things than filtering attributes.

It is important to understand how the context node supplied to the UPL program looks like:

  • The context node supplied to the UPL program is a temporarily, newly created, artificial, single node. It lives by itself and neither has a parent, nor siblings, nor children. It is neither the node in the context of its later serialization nor the actual node of the internal tree to be serialized, but merely just a lookalike of the former. This means that among other things, you cannot query its context nodes with XPath using eval-xpath().

  • The context node does not hold synthesized style info, nor does it hold attached user values.

The filtering UPL code is not called for nodes of other DOM node types than Element.

Clicking the Insert defaults button inserts the current upCast default filter setup for new XML Exporter instances before any existing code in the Attribute Filter text field.

Name

SerializationFilter

Java symbol

kSerializationFilterParamName

Type

String

Value

Maps

Unicode translation map

This field lets you enter a Unicode Translation Map. You can enter any mappings directly or include an externally created Unicode Translation Map file using the include realm: ${include(encoding="…"):file}.

Name

UnicodeTranslationMap

Java symbol

kUnicodeTranslationMapParamName

Type

String

Value

Unicode translation map code

CSS property unit map

Here, you can specify a mapping table that associates any CSS <length> property with a pair of {unit, precision}. When the module needs to write length or size information in form of CSS properties, it consults this list to determine which length unit to use at which precision. For a description of the format, see CSS property unit table.

You can enter any mappings directly or include an externally created CSS property unit table file using the include realm: ${include(encoding="…"):file}.

Name

CSSPropertyUnitMap

Java symbol

kCSSPropertyUnitMapParamName

Type

String

Value

CSS property unit map code

9. Commandline Processor [commandline]

This module serves for executing external system commands by way of the standard command-line interpreter available on the respective execution platform.

System command

The command to be executed by the underlying system’s command-line interpreter. You can use upCast variables for building the string.

For platform-independent, common file operations, upCast offers some internal "pseudo" commands:

upcast:delete-file filename+

Deletes all listed files.

upcast:copy-file source dest

Copies the file source to the new file dest.

upcast:move-file from to

Moves the file from to its new destination to. This is equivalent to the sequence of commands upcast:copy from to followed by upcast:delete-file from.

upcast:delete-recursively folder-or-file+

Recursively deletes all listed folders and/or files.

This command is potentially dangerous as it can lead to deleting a huge number of files when used carelessly! Please consider using upcast:delete-recursively-restricted instead.

upcast:delete-recursively-restricted deletionboundary folder-or-file*

Recursively deletes all listed folders and/or files that are equal or reside below the specified deletionboundary folder in the file system hierarchy.

This method is fail-fast, i.e. when a specified folder to be deleted is not hierarchically under the deletion boundary, any further actions on it are skipped. This should prevent the case where when you specify a folder where deletionboundary is a descendant of that folder, the complete contents of deletionboundary is deleted. Or, in other words: The specified root path for a recursive deletion operation must already satisfy the deletion boundary restriction to be considered any further.

Example 8.5. 

upcast:delete-recursively-restricted "/user/iloop/temp/" "/user/iloop/temp/test.txt"

deletes the file /user/iloop/temp/test.txt because it is a descendant of the deletion boundary folder /user/iloop/temp/.

upcast:delete-recursively-restricted "/user/iloop/temp/" "/user/iloop/"

deletes nothing because the folder /user/iloop/ is not a descendant of the deletion boundary folder /user/iloop/temp/.


Name

Commandline

Java symbol

kCommandlineParamName

Type

String

Value

commandline to execute, either as String or (in UPL or Java API) as List

Environment

This parameter lets you supplement, override or completely replace the environment that the child process started by the module inherits from its parent (i.e., upCast).

For this, we use the following syntax:

paramname ‘:=’ ‘"’ value ‘"’;

Each variable definition takes one line of text.

Note

Note that you must use CSS-style escapes (or numerical character entities of the form &#...;) to generate Unicode characters for specifying font names using characters outside the ASCII range.

All lines starting with // denote a comment line and are ignored, as do empty lines.

You can also specify a mode in which specified entries should be handled. Use either one of the following three mode options at the top of the environment specification on a line of its own:

@mode replace
@mode override
@mode supplement

@mode

This option controls the behavior of the environment variable entries:

replace

The complete environment is cleared, then the new entries are added

override

Specified variables are added to the environment, replacing already existing ones

supplement

Specified variables are added to the environment unless a variable of that name already exists, in which case the already existing one is kept unchanged and the newly specified one is discarded

The default mode is override.

Example 8.6. 

Writing

@mode replace

at the top of the environment variables definition code snippet will clear the current environment and only following definitions will be added.


Example 8.7. Environment specification examples

With this inherited environment:

PATH=/usr/bin
USER=iloop

, the following specifications for Environment will result in the shown final environment variable set:

Example 1

@mode override
USER:="johndoe"
SAMPLEVAR:="test"

results in

PATH=/usr/bin
USER=johndoe
SAMPLEVAR=test

Example 2

@mode replace
USER:="johndoe"
SAMPLEVAR:="test"

results in

USER=johndoe
SAMPLEVAR=test

Example 3

@mode supplement
USER:="johndoe"
SAMPLEVAR:="test"

results in

PATH=/usr/bin
USER=iloop
SAMPLEVAR=test

Note how in supplement mode, existing variables of the same name (here: USER) will keep their values unchanged.


Name

CommandlineEnvvars

Java symbol

kCommandlineEnvvarsParamName

Type

String

Value

["@mode replace" | "@mode supplement" | "@mode override"] envvarname=value*

Wait for completion

When checked, the command is executed synchronously, i.e. upCast waits until the external command has completed before continuing execution.

Important

Checking for errors occurring during external command execution can only be performed when this option is on. upCast considers any return value other than 0 (zero) an error.

Name

WaitForCompletion

Java symbol

kWaitForCompletionParamName

Type

Bool

Value

true | false

Timeout

Lets you specify the timeout for the external command in seconds. When the command does not exit and return within the specified time, it is forcibly killed by the module and an error message (WatchedCommandKilledWithTimeout) is issued.

The timout value is only active when Wait for completion is checked.

A timeout value of "0" waits indefinitely for the termination of the command.

Name

WaitForCompletionTimeout

Java symbol

kWaitForCompletionTimeoutParamName

Type

Integer

Value

seconds (positive integer value greater 0; when 0: wait indefinitely)

Redirect 'stdout' to…

This parameter lets you redirect the data the command writes to stdout either to a file or into a pipeline variable.

By default, when the field is left empty, the command's output to stdout will be written to upCast's log with a level of INFO.

When you specify an absolute file path, the output will be written to that file. Any existing file contents is cleared at each module's execution.

When you use the special syntax upcast:varname, the data sent to stdout by the command is written to the pipeline variable varname as String. You then can retrieve it via

${pipeline:varname}

in the GUI fields of upCast, or via

$pipeline:varname

from UPL code.

Note

Writing to a pipeline variable requires that Wait for completion is turned on.

Name

RedirectStdout

Java symbol

kRedirectStdoutParamName

Type

String

Value

'' | upcast:pipeline-variable-name | absolute-filename

Redirect 'stderr' to…

This parameter lets you redirect the data the command writes to stderr either to a file or into a pipeline variable.

By default, when the field is left empty, the command's output to stderr will be written to upCast's log with a level of ERROR.

When you specify an absolute file path, the output will be written to that file. Any existing file contents is cleared at each module's execution.

When you use the special syntax upcast:varname, the data sent to stderr by the command is written to the pipeline variable varname as String. You then can retrieve it via

${pipeline:varname}

in the GUI fields of upCast, or via

$pipeline:varname

from UPL code.

Note

Writing to a pipeline variable requires that Wait for completion is turned on.

Name

RedirectStderr

Java symbol

kRedirectStderrParamName

Type

String

Value

'' | upcast:pipeline-variable-name | absolute-filename

Example 8.8. 

To create a new directory images in the folder specified by the global variable DestinationFolder on a Unix system, you would use the following command:

mkdir "${pipeline:DestinationFolder#localpath}/images"

Note the quotes around the parameter to accommodate for path names that contain e.g. space characters.


10. XSLT Processor [xslt]

This module lets you apply an XSLT transformation to some external file (which might be the result of an earlier exporter module). You can choose between the Xalan XSLT processor from the Apache Software Foundation (ASF; http://xml.apache.org/), Saxon 6.5.5 by Michael Kay, or Saxon-B (version 9) from Saxonica (http://www.saxonica.com).

Source File

Specify the file the transformation should be applied to, most probably an XML file. You can use all upCast variables for dynamically creating the full path to the file.

Name

SourceFile

Java symbol

kSourceFileParamName

Type

Object

Value

absolute file path in URL or local file system convention

XSL Transformation File(s)

Specify the XSLT transformation ("XSLT file") to apply.

You can specify several transformations (use one line each to specify the full path to an XSLT file) or, in other words: the paths must be separated by a newline character. These will be chained, i.e. the original source file will be processed using the first XSLT file specified, the result will be processed by the second and so on. Note, however, that all transformations share the same XSLT parameters.

Empty lines are ignored.

Lines starting with a hash mark ('#') or two forward slashes ('//') are considered comments and ignored.

Note

Use this for documentation purposes or to quickly disable one of the stylesheets in a processing chain by prefixing its line with a # .

Name

Stylesheet

Java symbol

kStylesheetParamName

Type

String

Value

path to stylesheet

XSLT parameters

Lets you specify parameters to be passed to the transformation. A parameter definition must follow this syntax:

paramname ‘:=’ ‘"’ value ‘"’;

Quotes within the parameter value must themselves be quoted using the backslash character ‘\’.

You may use upCast’s variable system for constructing parameter values.

Important

The text in this field and any contained variable references are resolved as follows:

  1. Any references to the include realm are resolved.

  2. The individual assignments are parsed according to the above syntax.

  3. Only now, any further references to realms other than include are resolved separately for each parameter assignment’s value part of a parameter assignment.

This algorithm covers the usual cases where you might want to include constant parameter assignment code shared by several XSLT Processor modules in a pipeline using an include variable reference, but also reference variable values without having to worry whether their actual value breaks the value assignment syntax described above.

Tip

There is a specially named parameter: il.stylesheet.intermediates.folder

When this is specified and set to a writable folder on disk, the intermediate results after each step of an XSLT processing chain is serialized in a separate file in that folder, with the number in the file name indicating after which step that respective file was serialized.

Name

StylesheetParameters

Java symbol

kStylesheetParametersParamName

Type

String

Value

(string in same format as in UI)

Result file

Specify where the transformation result should be written. You can use all upCast variables for dynamically creating the full path to the file.

Name

DestinationFile

Java symbol

kDestinationFileParamName

Type

String

Value

absolute path to desired result file

XSLT processor

Lets you choose between Xalan and Saxon 6.x or Saxon 9.x as the XSLT processor to use (if available).

Name

XSLTProcessor

Java symbol

kXSLTProcessorParamName

Type

String

Value

xalan | saxon6 | saxon

11. Unicode Translation Processor [unicodetranslator]

This module lets you apply a Unicode Translation Map to an already existing XML document. Additionally, by way of the Output encoding parameter, you can quickly change the character encoding used in an XML file.

Though the implementation tries to preserve the formatting of the original document while doing its thing, there is no guarantee that the result is syntactically equivalent to the input, though structurally, it of course is.

The Unicode Translation Map rules are only applied to the XML document’s text and attribute nodes. Comments and PIs are left unchanged.

Source File

Specify the file the transformation should be applied to, which must be an XML file. You can use all upCast variables for dynamically creating the full path to the file.

Name

SourceFile

Java symbol

kSourceFileParamName

Type

Object

Value

absolute file path in URL or local file system convention

Unicode Translation Map

This field lets you enter a Unicode Translation Map. You can enter any mappings directly or include an externally created Unicode Translation Map file using the ${include(encoding:="…"):file} variable reference, which is automatically replaced by the contents of the specified file after reading it using the specified encoding.

When you leave this field completely empty, no Unicode translation is performed. You can use this if the only thing you want to do is changing the character encoding the XML file is in by specifying the desired Output encoding.

Name

UnicodeTranslationMap

Java symbol

kUnicodeTranslationMapParamName

Type

String

Value

Unicode translation map code

Destination file

Specify where the translation result should be written to. You can use all upCast variables for dynamically creating the full path to the file.

Name

DestinationFile

Java symbol

kDestinationFileParamName

Type

String

Value

absolute path to desired result file

XML version attribute

Specify the value of the version attribute on the XML declaration at the beginning of the result XML file.

If you leave this empty, no XML declaration will be written. The default value is "1.0".

Note that this is a textual parameter only; specifying e.g. "1.1" does not modify the file written such that it is a valid XML 1.1 file.

Name

XMLVersion

Java symbol

kXMLVersionParamName

Type

String

Value

value to be written in the 'version' attribute of the XML declaration; when empty, XML declaration is suppressed

Output encoding

Lets you specify a name of a supported output file encoding, e.g. UTF-8 or iso-8859-1. This encoding is also specified in the encoding attribute on the XML declaration (if written, see XML Version parameter above).

Name

OutputEncoding

Java symbol

kOutputEncodingParamName

Type

String

Value

Java encoding name

DOCTYPE declaration

This lets you add, override or remove an existing doctype declaration in the incoming document.

  • When this field is a single asterisk ("*"), the doctype declaration in the source document (if present) is passed through as-is.

  • When this field is empty (""), any doctype declaration present in the source document is stripped from the output.

  • When this field contains any other data, that data is written verbatim to the output, replacing any possibly existing doctype declaration in the input document.

Name

DOCTYPEDeclaration

Java symbol

kDOCTYPEDeclarationParamName

Type

String

Value

literal value of full DOCTYPE declaration as String; when empty, DOCTYPE declaration is removed, when '*', DOCTYPE declaration is copied as in source

12. XML Validator [validator]

This module serves for validating arbitrary XML documents. The module supports validation against an XML DTD, XML Schema and Relax NG.

Source File

Specify the XML file that should be validated. You can use all upCast variables for dynamically creating the full path to the file.

Name

SourceFile

Java symbol

kSourceFileParamName

Type

Object

Value

absolute file path in URL or local file system convention

Redirect report to

Specify a destination file where the validation report will be written to.

When you specify an absolute file path, the output will be written to that file. Any existing file contents is cleared at each module's execution.

When you use the special syntax upcast:varname, the report data is written to the pipeline variable varname as String. You then can retrieve it via

${pipeline:varname}

in the GUI fields of upCast, or via

$pipeline:varname

from UPL code.

The validation report is an XML file in UTF-8 encoding with root element <validation-report>. It has child messages of the following form:

<msg 
    system-id="validated-file-url" 
    line="line-number" 
    col="column-number">
  ...validation message...
</msg>

The validation message is constructed from the respective schema type's validation handler's error message (SAXParseException).

Name

ReportDestination

Java symbol

kReportDestinationParamName

Type

String

Value

'' | upcast:pipeline-variable-name | absolute-filename

Schema type

Specify the type of Schema you want to validate the file against:

XML DTD

validate against an XML DTD; the document to be validated must have a valid DOCTYPE declaration

XML Schema

validate against an XML Schema; the document must have the respective schema file location attributes

Relax NG

validate against a Relax NG schema; you must specify the location and type of the Relax NG schema file using the specific parameters shown when this type is selected (see below)

Name

SchemaType

Java symbol

kSchemaTypeParamName

Type

String

Value

dtd | xmlschema | relaxng

External DTD subset fallback

(for XML DTD schema type only)

This parameter lets you choose a fallback for the external DTD subset for the document to be validated. This is useful if you want to validate an XML document that does not have a DOCTYPE declaration, but you have an XML DTD that you know the imported document must satisfy. Specifying the external DTD subset here (DTD file) allows you to supply that info to the parser, which in turn uses that info for validation.

Essentially, specifying a file here has the same effect as if the source XML document had a

<!DOCTYPE root SYSTEM "specified-file">

declaration, unless it explicitly specifies a DOCTYPE declaration of its own (in which case the latter takes precedence).

Name

ExternalSubsetLocation

Java symbol

kExternalSubsetLocationParamName

Type

String

Value

Relax NG Schema file

(for Relax NG schema type only)

Specify the location of the Relax NG schema file to validate the Source File against.

Name

RelaxSchemaLocation

Java symbol

kRelaxSchemaLocationParamName

Type

String

Value

absolute file path

Relax NG Syntax

(for Relax NG schema type only)

Specify the syntax the Relax NG schema file is written in, either XML syntax or compact syntax.

Name

RelaxSyntax

Java symbol

kRelaxSyntaxParamName

Type

String

Value

xml | compact

13. CSS Exporter [css]

This module writes an external Cascading Style Sheets, level 2 (CSS2) file comprising all styles (paragraph styles and character styles) used in the current internal document, matching their visual appearance as closely as reasonably possible. The output also includes information on the page setup like paper size and margins.

The CSS2 file written may for example be referenced by a file created by the XML Exporter module.

Selector syntax

Lets you choose which CSS selector syntax should be used:

CSS1 (‘class’ shorthand)

Writes selectors using the ‘class’ attribute shorthand: .classname { ... }

CSS2 Selectors

Writes selectors according to CSS2 selector syntax rules: *[class=classname] = { ... }

CSS1+CSS2

Writes both ways of expressing the selector so that tools understanding either can pick the one that they understand. First, the shorthand is written, followed by full CSS2 selector.

Name

SelectorSyntax

Java symbol

kSelectorSyntaxParamName

Type

String

Value

css1 | css2 | all

upCast DTD elements namespace prefix

Specify the namespace prefix for the upCast DTD elements that the final XML file is using which includes the generated CSS file by this module.

The default is the empty string, i.e. no namespace prefix used.

Note

Setting this parameter is necessary until widespread support for the CSS Namespaces Module is available. Until then, element names are bound by their qualified name, including namespace prefix plus separating colon (if existant). To generate the qualified element name, the module must be told the namespace prefixes it should use.

Name

UpcastDTDNamespacePrefix

Java symbol

kUpcastDTDNamespacePrefixParamName

Type

String

Value

prefix for elements in upCast DTD

HTML4 DTD elements namespace prefix

Specify the namespace prefix for the HTML4 elements that the final XML file is using which includes the generated CSS file by this module. HTML elements are e.g. used for tables (if you opted for the HTML table model).

The default is html.

Name

HTML4DTDNamespacePrefix

Java symbol

kHTML4DTDNamespacePrefixParamName

Type

String

Value

the desired namespace prefix

Output file

Specify where the CSS file should be written. You can use all upCast variables for dynamically creating the full path to the file.

Name

DestinationFile

Java symbol

kDestinationFileParamName

Type

String

Value

absolute path to desired result file

Output encoding

Lets you specify a name of a supported output file encoding, e.g. UTF-8 or iso-8859-1. This encoding is also specified in the @charset rule at the very beginning of the CSS file.

Name

OutputEncoding

Java symbol

kOutputEncodingParamName

Type

String

Value

Java encoding name

14. RTF Exporter ("downCast") [rtfexport]

Note

This module requires an appropriate RTF Exporter feature included in your license to be fully functional.

The RTF Exporter was formerly a separate product called "downCast". This module is a much improved version of downCast 1.x, especially in respect to performance (up to 300% faster) .

This module converts XML documents to Word or, more precisely, RTF documents. For specifying the layout, the module relies on a subset of Cascading Style Sheets, level 2 (CSS2) properties, amended by several proprietary properties where needed. Input XML documents must either be valid against the upCast DTD (note that this is different from the upCast internal DTD!), or they can be any arbitrary XML language for which a transformation into the upCast DTD can (and needs to) be created.

For more details on supported CSS and custom properties and their semantics, see the separate RTF Exporter documentation.

Source File

Specify the XML file that should be converted to RTF. You can use all upCast variables for dynamically creating the full path to the file. This must be an XML file conforming to the upCast DTD or – in experimental status – XSL-FO.

Name

SourceFile

Java symbol

kSourceFileParamName

Type

Object

Value

absolute file path in URL or local file system convention

Destination file

Specify where the RTF result should be written to. You can use all upCast variables for dynamically creating the full path to the file.

When running on Windows and having WordLink installed and functional, by specifying the destination file extension as .doc, you can have the module automatically convert the generated RTF file into a Word binary file.

Name

DestinationFile

Java symbol

kDestinationFileParamName

Type

String

Value

absolute path to desired result file

Source format

Specify the format the source file is in, either upCast DTD or XSL-FO.

Name

SourceFormat

Java symbol

kSourceFormatParamName

Type

String

Value

upcast | xslfo

Output resolution

When the RTF exporter must include images that do not specify their resolution explicitly in the file, the application uses the value that you specify here to calculate image size and resulting scaling factor to apply in the RTF output.

The default value is 96 dpi.

Name

OutputResolution

Java symbol

kOutputResolutionParamName

Type

Double

Value

1..9999

Missing/unsupported images

Specifies what the RTF Exporter should do when it encounters images in a format that it cannot handle or that are not supported in RTF, or when an image file to embed into a document is missing.

discard

the image is completely removed from the result

show filename only as inline text

error text indicating the file name of the missing image is embedded into the output document, prominently visible to the user

show full path as inline text

error text indicating the full, absolute path and file name of the missing image is embedded into the output document, prominently visible to the user

show detailed error message as inline text

error text indicating the full, absolute path and file name of the missing or unsupported image, including further error details, is embedded into the output document, prominently visible to the user

replace with generic image

a generic replacement image is embedded into the final result document, respecting and scaled to the originally requested image size so it does not break the layout of the document

Name

ImageErrorHandling

Java symbol

kImageErrorHandlingParamName

Type

String

Value

discard | filename | filepath | details | image

issue runtime error

When checked, missing or unsupported images will cause a runtime error. When deselected, a warning only will be generated. The message id will be the same for both cases, however.

Name

ImageErrorSignalling

Java symbol

kImageErrorSignallingParamName

Type

String

Value

error | warning

User stylesheet

Here, you can specify a CSS stylesheet to use for the conversion instead of the stylesheet (possibly) specified in the XML source. You can use all upCast variables for dynamically creating the full path to that file.

Name

UserStylesheet

Java symbol

kUserStylesheetParamName

Type

String

Value

path to user stylesheet

Whitespace handler class

For experts only!

The RTF Exporter makes use of special code to handle whitespace characters in the input stream. This field lets you set a custom whitespace handler if this is required. A whitespace handler must be a Java class that implements the WhitespaceHandler interface. If you think you need to implement your own whitespace handler, please contact us directly at <support@infinity-loop.de> in advance.

The default value is ‘*’ (asterisk) which lets the implementation decide on the most appropriate whitespace handler for the input document and should not be changed for normal use.

The module provides three Whitespace Handlers for different situations. You request their explicit use by specifying their full, qualified class name in the Whitespace Handler class input field.

Important

Except for the NoopWhiteSpaceHandler, all are more or less experimental and we do not guarantee their correctness or usefulness.

de.infinityloop.downcast.rtflib.NoopWhiteSpaceHandler

This is the default handler for input documents valid according to the upCast DTD. All whitespace is significant in mixed-content elements. It is automatically used when you specify ‘*’ and the Source format is upCast DTD.

de.infinityloop.downcast.rtflib.XSLFOWhiteSpaceHandler

This is a white space minimizing handler, minimizing whitespace in mixed-content elements. It is automatically used when you specify ‘*’ and the Source format is XSL-FO. It tries to mimic XSL-FO required behavior when minimizing whitespace before and around inline elements. Whitespace is collapsed to the left.

de.infinityloop.downcast.rtflib.CSS3WhiteSpaceHandler

This handler behaves exactly the same as the XSLFOWhiteSpaceHandler, except that it respects the setting of the white-space CSS3 shorthand property, resp. its all-space-treatment component when resolved to its constituent properties. When this has the value preserve, whitespace is preserved in that element, unless overridden in a child element. When this is collapse (the default), the handler behaves as described above. Note that you should explicitly specify the desired behavior on the immediate parent element of (possibly) mixed content.

ID rendering mode

For elements having an id attribute of type ID, you can specify if and how this information should be translated into RTF bookmarks.

don’t render

The ID information is not used and no bookmarks are created in the resulting RTF based on an id attribute.

before element only

A bookmark with the id’s value is created just before the start of the element’s contents.

after element only

A bookmark with the id’s value is created immediately after the full contents of the element has been written to the RTF file.

surround element

A bookmark with the id’s value is created that starts just before the start of the element’s contents and ends just after the full contents of the element has been written to RTF, i.e. the bookmark spans the contents of the element.

Name

IDRenderMode

Java symbol

kIdRenderModeParamName

Type

String

Value

surround | ignore | before | after

Style name output format

Determines how style names should be written to the RTF stylesheet destination. When Unicode, we generally use Unicode characters to express possible umlauts; if normal (use document encoding), the document encoding is used wherever possible.

Name

StyleNameFormat

Java symbol

kStyleNameFormatParamName

Type

String

Value

unicode | normal

Table ‘frame’ attribute overrides cell border definitions

When checked, the frame attribute on table elements overrides any settings of cell borders that border on the outmost surrounding table border.

When not checked, a cell’s border CSS definition takes highest precedence in rendering.

Name

FrameOverridesCells

Java symbol

kFrameOverridesCellsParamName

Type

String

Value

true | false

15. External Pipeline Processor [extpipeline]

This module lets you execute another, external pipeline document as a sub-pipeline within the current pipeline execution.

Note

It is not possible to provide the external pipeline in form of a Java Stream object, it must be an external file residing in the file system.

The result value of this module is the result value of the executed sub-pipeline.

Source File

The path to the external pipeline document (.ucdoc) to include in the current pipeline.

Name

SourceFile

Java symbol

kSourceFileParamName

Type

Object

Value

absolute file path in URL or local file system convention

Pipeline variables

This lets you choose how pipeline variables for the included pipeline should be created:

Use independent variables in sub-pipeline

the included pipeline gets its own, initially empty set of pipeline parameters. Think of this as when running that pipeline as a completely independent pipeline

Copy variables to sub-pipeline

this creates a copy of the current pipeline variables and passes it on to the included pipeline. This lets you pass all the current pipeline variables to the included pipeline. When the included pipeline modifies any variables, this only affects itself, but not the calling pipeline. This way, it is possible to provide values (like "parameters") to the included pipeline. When execution of the included pipeline finishes, the pipeline variables of the calling pipeline will be in exactly the same state as before running the included pipeline. Effectively, the included pipeline can not have any side-effects on the callers set of variables.

Share variables with sub-pipeline

in this mode, the included pipeline uses the same instance of pipeline variables as the caller. This means that the included pipeline receives and can modify the pipeline variables of the including pipeline. This way, it is possible to provide values (like "parameters") to the included pipeline, and have the included pipeline "return values" by setting them in the pipeline variables.

The only exception to this rule is the pipeline:base variable, which is not inherited but set according to the included pipeline’s location on disk so that relative references therein are resolved properly. After the sub-pipeline’s execution, the original value is restored for the pipeline:base variable before continuing in the calling pipeline.

To have more control per parameter how it behaves in sub-pipeline execution environments, there is a specific property for specifying the setting behaviour. For each parameter, you can specify the initialize-when property, with values never, always, unset or an arbitrary <string-value>. The default value is unset. Here’s an outline of what happens with respect to pipeline parameters during a sub-pipeline call in all of the three cases above:

  1. The pipeline variables pool of the sub-pipeline to be called is initialized or created according to the above parameter.

  2. The following pipeline variables are set to their appropriate values depending on the storage location of the sub-pipeline: base, PipelineBase, ParamBase, PipelineURI, ParamURI, PipelineInstanceId.

  3. Any sub-pipeline parameters specified in the Parameters field are written to the sub-pipeline's variable realm.

  4. Finally, for each parameter defined in the sub-pipeline:

    1. If the parameter’s initialize-when value is unset (or the property is not defined) and the pipeline variable pool does not already contain a variable by that name:

      1. If the parameter is a persistent parameter, a new variable is created in the pipeline variables with that parameter’s current value as stored in the sub-pipeline document as its value.

      2. Otherwise, if the parameter is not a persistent parameter, but has a default value defined, a new pipeline variable is created in the pipeline variables with that parameter’s default value as its value.

      3. Otherwise, if it’s neither a persistent parameter nor has it a default value, that pipeline variable is left undefined (and will probably cause execution errors later if the pipeline tries to access that value). This condition is considered a programming error.

    2. If the parameter’s initialize-when value is always:

      1. If the parameter is a persistent parameter, a new variable is created (or a possibly existing variable overwritten) in the pipeline variables with that parameter’s current value as stored in the sub-pipeline document as its value.

      2. Otherwise, if the parameter is not a persistent parameter, but has a default value defined, a new pipeline variable is created (or a possibly existing variable overwritten) in the pipeline variables with that parameter’s default value as its value.

      3. Otherwise, if it’s neither a persistent parameter nor has it a default value, that pipeline variable is left undefined (and will probably cause execution errors later if the pipeline tries to access that value). This condition is considered a programming error.

    3. If the parameter’s initialize-when value is never, no further actions are taken.

    4. If the parameter’s initialize-when value is a string value and either the pipeline variables do not already contain a variable by that name or any existing variable by that name has the same string value as the string specified for the initialize-when value:

      1. If the parameter is a persistent parameter, a new variable is created (or a possibly existing variable overwritten) in the pipeline variables with that parameter’s current value as stored in the sub-pipeline document as its value.

      2. Otherwise, if the parameter is not a persistent parameter, but has a default value defined, a new pipeline variable is created (or a possibly existing variable overwritten) in the pipeline variables with that parameter’s default value as its value.

      3. Otherwise, if it’s neither a persistent parameter nor has it a default value, that pipeline variable is left undefined (and will probably cause execution errors later if the pipeline tries to access that value). This condition is considered a programming error.

Example 8.9. 

By creating a parameter definition

copyright {
  type: text;
  label: "Copyright notice:";
}

in the main pipeline, which calls a sub-pipeline with the definition

copyright {
  type: text;
  label: "Copyright notice:";
  default: "(c) 2008 My Company";
  initialize-when: "";
}

the sub-pipeline will have set the copyright pipeline variable to the "(c) 2008 My Company" default value if the user did not provide a value in the text field for Copyright notice in the main pipeline. This lets you implement some sort of default or fallback value mechanism for values that are used in a sub-pipeline if the value has not been set (or, to be exact: has been set to the empty string "") in the calling pipeline.


Name

PipelineRealmMode

Java symbol

kPipelineRealmModeParamName

Type

String

Value

separate | copy | share

Only run modules with ‘exported’ status

When checked, only modules that have the status exported set will be executed.

Tip

You can use this feature like this:

Develop your sub-pipeline on its own. For testing and debugging purposes, you will probably want to provide initial values (using an instance of the Pipeline Variables module) and debugging output within the pipeline using additional instances of the XML (Raw) Exporter modules. Now simply remove the exported status on these modules in the sub-pipeline and check the above option in the importing pipeline.

Effectively, this ensures that all debugging and setup code is only run when you run the sub-pipeline on its own (e.g. during development and isolated debugging), but does not run when the pipeline is included in any other pipelines. No further module activation/deactivation orgies to think of, all done automatically once set up as described – pretty neat, isn’t it?

Name

OnlyRunExportedModules

Java symbol

kOnlyRunExportedModulesParamName

Type

Bool

Value

true | false

Sub-pipeline Parameters

Parameters

Lets you specify parameters to be passed to the called sub-pipeline. This is especially useful when calling the sub-pipeline in Use independent variables in sub-pipeline or Copy variables to sub-pipeline mode. The parameters defined here are explicitly set in the pipeline realm of the sub-pipeline’s variables to the values specified here. This happens before any modules of the sub-pipeline run. Using this mechanism, it is possible to pass certain variable values to the sub-pipeline without having to share the pipeline variable pool with the calling pipeline. Note, however, that resulting variable’s values can not be passed from a sub-pipeline back to the calling pipeline.

A parameter definition must follow this syntax:

paramname ‘:=’ ‘"’ value ‘"’;

Quotes within the parameter value must themselves be quoted using the backslash character ‘\’.

You may use upCast’s variable system for constructing parameter values.

Important

The text in this field and any contained variable references are resolved as follows:

  1. Any references to the include realm are resolved.

  2. The individual assignments are parsed according to the above syntax.

  3. Only now, any further references to realms other than include are resolved separately for each parameter assignment’s value part of a parameter assignment.

This algorithm covers the usual cases where you might want to include constant parameter assignment code shared by several External Pipeline Processor modules in a pipeline using an include variable reference, but also reference variable values without having to worry whether their actual value breaks the value assignment syntax described above.

Name

PipelineVariables

Java symbol

kPipelineVariablesParamName

Type

String

Value

(string in same syntax as in corresponding UI field)

Chapter 9. Parameter Sets

1. What parameter sets are

When working with pipelines, especially ones that are parameterized, it is often convenient to have different sets of parameter settings at hand to run the pipeline. For example, when you are converting documents in the DocBook DTD to your own, you may want to set different header info depending on whether it is a technical article or a medical article. The conversion itself, however, is the same for both. In this case, you’d set up the actual conversion pipeline only once (with the benefit that both document types automatically see any improvements in that pipeline automatically), but have different sets of parameters for the text in the document header area. So what you would do is to create two parameter sets for that single pipeline document and store them in a document separate from the actual implementation logic. Depending on the document you need to convert, you’d just load the respective parameter set document and start the conversion, with all the parameters for that particular document type already set up correctly, with you only having to specify the input file. Well – this is what parameter set documents are for! They separate actual parameter value storage from the pipeline implementation (where they are normally stored as part of the pipeline document).

2. What parameter sets contain

Parameter sets contain essentially only two types of data:

  • the current value of all Simple View parameters that have the persistent flag set to true

  • the Pipeline UID they are based on resp. referring to

That’s all. Parameter sets, in particular, do not contain any application or pipeline logic.

3. How parameter sets work

A parameter set is always derived from a single, specific pipeline document. The link to its implementing pipeline is established by way of the pipeline’s UID.

When loading a parameter set, what actually happens is that the pipeline it is based on is loaded automatically, and then the parameter values stored in the parameter set document are automatically set on that pipeline. To the user, it looks like she has just opened a pipeline document in Simple View mode, then set the parameter values as stored in the parameter set. The only difference is that the user cannot actually edit the pipeline implementation, or in other words: he cannot switch to edit mode. The second visual difference is that parameter sets open in a window with blue background, whereas real pipelines display in the default system color background.

When parameter values in a parameter set are edited, they can be saved back to the parameter set using the usual commands in the File menu: Save (to save in the same file, overwriting the old values) or Save As… to save the parameter set in a new file. You can also copy parameter set files on the operating system level and/or rename them.

Now, how do they find their pipeline implementation file when opened? This is done (re-)purposing the well-known and established Catalog system. The only difference is that you do not resolve the PUBLIC identifier of some DTD or entity to an absolute file path, but the pipeline UID, which you can think of the PUBLIC identifier of the pipeline document. This mechanism allows you to configure your system easily so that these Pipeline UIDs can be resolved to the actual, single implementation file from literally anywhere on your network: Just set up a single catalog for all your pipeline implementations and have your users add that file to upCast’s catalog system in the upCast Preferences, Catalog tab.

Example 9.1. Pipeline UID catalog example

A catalog might contain the following entries:

PUBLIC "d3546614-fb0e-4739-bfea-1f74280d9761" "file:///upcast/pipelines/docbook.ucdoc"
PUBLIC "ACME-XHTML-conversion-pipelineV1.1" "file:///upcast/pipelines/acme2html.ucdoc"

When you add this file to upCast’s catalog system, you can open a parameter set from anywhere on your local disk or even the LAN and have it automatically load and run the pipeline document it depends on.

In the first line, the Pipeline UID has been auto-generated by upCast and is using a standard UUID.

In the second line, a speaking UID has been chosen by the pipeline author, who of course must be sure and ensure that this ID will never be used in any of the pipelines a potential user will want to run using a parameter set file.


4. Creating parameter sets

The first parameter set for a certain pipeline must be created by opening it in upCast, then doing a File > Save to Parameter Set… . You will be prompted for a file name to save the parameter set to. Parameter set files always have the extension ucpar (short for upCast parameter set). The pipeline document will be closed and the new parameter set file will be opened in its place.

From there on, you can create additional instances either by repeating the above, or simply by saving copies of an open parameter set.

Note

Note that only the values of parameters are saved that have their persistent property set to true in the implementing pipeline document. The decision on this property is up to the pipeline author. You will see all parameters defined in original pipeline when opening a parameter set, those values will either be empty or filled with the default values the pipeline author has specified for those parameters.

5. Variable: ${pipeline:ParamBase}

Even when loading a parameter set, be aware that the pipeline variablereference to ${pipeline:base} will resolve to the folder where the implementing pipeline document is located, not where the parameter set document lives.

If you want to specify e.g. file path parameters relatively to the location of the parameter set, you can use the new variable ${pipeline:ParamBase} that is automatically created, and which holds the absolute path to the folder within which the respective parameter set resides on disk..

Note

Even for pipeline documents, ${pipeline:ParamBase} is always defined. In that case, it has the same value as ${pipeline:base}.

6. What happens when…

6.1. …the pipeline implementation’s number or type of parameters changes?

In this case, only the parameters that still have their counterpart will be loaded from the parameter set, and for the remaining parameters it will be automatically updated to the new parameter configuration. This is done on a best-effort basis. Incompatible parameter’s values will be discarded.

6.2. …I change a pipeline implementation while a depending parameter set is open?

When the changes are not affecting the configuration of parameters, the pipeline implementation will be re-loaded automatically once you click the Run button. This will only work reliably when your file system delivers correct last modified date information for files.

When changes are also affecting the configuration (number, type, text, defaults etc.) of pipeline parameters, the parameter set will detect this when re-loading the pipeline implementation due to the change and instruct you to close, then re-open the parameter set to have it pick up the changes.

6.3. …the Pipeline UID changes and parameter sets using the old id already exist?

Assuming you updated the respective catalog entry, the parameter set will no longer be able to resolve its id to the required pipeline implementation and therefore cannot be used any longer.

Also, when the pipeline document a catalog UID lookup resolves to does not actually match the requested UID, an error dialog will be shown and the parameter set cannot be used.

6.4. …there is no mapping in the catalog system for a certain parameter set UID?

In this case, the system will try to load the pipeline implementation from the system path additionally stored in the parameter set. This path holds the absolute path to the pipeline document at the time the File > Save to Parameter Set… command was run. When this file still exists and its a pipeline document that has the requested Pipeline UID, then that pipeline implementation is loaded. Otherwise, an error is issued and the parameter set cannot be opened.

Chapter 10. Grouping using Painters

Basically, the action of making consecutive sibling nodes based on certain conditions children of a newly created surrounding element is called grouping. These conditions are exposed to you by way of the unique painter concept.

1. The Painter concept

To understand the painter concept, you first of all need to be fully aware of the following, most important fact: Grouping is always performed on a flat, linear list of nodes. Huh? I thought we’re working on a document tree? Though this is of course true, grouping only occurs among sibling nodes, i.e. all direct children nodes of an individual element. Any element’s direct children can be expressed by an ordered, flat list. Of course, we recursively group on a child’s list of children, but this is a completely independent grouping operation. So again, a single, independent grouping operation is always performed on a flat, ordered list of nodes.

Now, for the following let’s think of nodes being white bricks placed in an ordered row on the floor. These bricks can be painted with one (or even several – think: spotty!) colors. The color indicates the element by which the bricks should be grouped.

The grouper does one very simple thing: It wraps all adjacent, likewise colored nodes in a parent element (think of this being some kind of bag) that has the same name as the color of the nodes it wraps.

So the essential part to be done beforehand is to color the nodes in the desired way. This is a two-step process: First, you need to check the role of each node as far as grouping is concerned and assign it that role by placing a painter on it that knows how to go about painting for this specific role. Second, the painting is actually performed.

1.1. Tagging nodes and placing the painters

In this first step, consider yourself a paint-shop owner, making a work-plan for your painter employees. Equipped with a packet of self-adhesive post-it notes and a pencil, you start figuring out the work to be done at the first node in the list of sibling nodes. For now, you are just interested in determining which nodes should be collected into groups of the color green. You examine the node you are on. For example, you may look at some of its attributes or layout properties, or perform a more complex examination which may include evaluating a boolean XPath expression. After some pondering, you will come to a certain conclusion as to the role of the node you are currently standing on. This can be one of the following:

You know that this node will always start a group of the color you are currently considering (i.e. green). Therefore, you write "start green" on one of your post-its and tack that to the node.

You know that this node will always end (and therefore be the last one in) a group of the color you are currently considering (i.e. green). Therefore, you write "end green" on one of your post-its and tack that to the node.

Now it is time to think of which of your painter employees is best suited for the painting job. For this you have to evaluate the constellations that may happen in your document regarding the nodes that should be grouped.

For example, you may know that if you don’t find a node starting a group and a node ending the group, the grouping should not occur. In other words, the known start and end nodes (i.e. nodes that fulfill the requirements for being tagged as such) are required for a grouping to happen.

Other situations could be as follows: group from a start node to the next start node, group from an end node to the next end node, group adjacent likewise colored nodes, etc. For each of these situations, you have dedicated painters. To have them do their work in the next step, you place them on nodes.

Suppose in our example, we require a start and end node for a grouping to happen, and we have just tagged the current node as a start node. We therefore choose a start-end painter and place it on the current node.

When we have done both, tagged the node (if possible) and placed a painter (if we could determine a suitable), we move on to the next node in the ordered sequence and start over.

Finally, we’ll reach the last node in the sibling node sequence and will have tagged some nodes and/or placed painters on some of the nodes. Now, all preparation work is done and we can tell the painters to do their work, i.e. start painting.

1.2. Painting the nodes

Now, consider yourself a painter, with a bucket of color of a certain kind (the color-"name" corresponds to the element name that should be the grouping element later). In the previous step, you have been placed on some node in the sequence.

Depending on your kind, you try to paint from your location.

In our example, you are a start-end painter. This means from the place you are at, you look in direction of the start of the sequence and look for the nearest node that has been tagged with a "start green" label. (This may be the node you are standing on.) If you find such a node, you remember it. If you do not find such a node, you cannot fulfill your task (which is "Paint from start node to end node") and give up, not painting anything.

Next, you look into the direction of the end of the sequence and look for the nearest node tagged with an "end green" label. (This may, again, be the node you are standing on.) If you find that as well, you can fulfill your painting job and start painting all nodes from the start node you found to the end node you found (including both). Then, you are finished.

The above is repeated for all painters that have been placed on nodes in the current node sequence. After this has been finished, the complete sequence got painted in a way that the actual grouping can take place, based on the paint color information on each node and the start and end tagging.

2. Node Tags

For each color, a node can have either no tag, or it can be tagged as a start node, tagged as an end node, or tagged as both, start and end node for that respective color.

These tags can currently be set using the UPL functions mark-start() and mark-end().

3. Painter Types

The example in the introduction to the painter concept already mentioned the start-end painter type. Painters can be placed on a node using the UPL function set-painter().

Note that you can place an ordered list of painters for a single color on a node. The idea is to have fallback painters when the first one fails to paint because its requirements cannot be fulfilled (like e.g. for a start-end painter, when there’s either no start tag or end tag). In such a case, painting using the second-specified painter is tried. If that cannot paint as well due to unsatisfied requirements, the next painter is tried and so on until either a painter is able to paint, or the end of the list is reached, in which case no painting occurs.

In the examples below for each painter, we use the following symbols:

Legend for example graphics

Follows a description of all available painter types:

3.1. start-end

This painter will paint from the nearest start-tagged node of the node sequence (in direction to the start) to the nearest end-tagged node (in direction to the end), observing its own node.

There may be no end-tagged node between the painter and the nearest start-tagged node, nor a start-tagged node between the painter and the nearest end-tagged node. In both of these cases, the painter will fail. The "-" in the name symbolizes a direct, uninterrupted (by other tags) sequence of nodes between start and end.

Results of a start-end painter

3.2. start*end

This is the same as start-end, but it allows differently tagged nodes between the nearest start- and end-tagged nodes (see last two examples). The "*" in the name symbolizes a wildcard sequence of tagged nodes between start and end.

The painter will fail if either there’s no start-tagged node earlier in the node list or no end-tagged node later in the node list.

Results of start*end painter

3.3. start-here

This painter will paint from the last start-tagged node up to the one it was placed on.

There may be no end-tagged node between the painter and the nearest start-tagged node. If this is the case, the painter will fail. The "-" in the name symbolizes a direct, uninterrupted (by other tags) sequence of nodes between start and painter position.

The painter will also fail if there is no start-tagged node earlier in the node list.

Results of start-here painter

3.4. start*here

This is the same as start-here, but it allows end-tagged nodes between it and the nearest preceding start-tagged node (see last two examples). The "*" in the name symbolizes a wildcard sequence of end-tagged nodes between start and painter node.

The painter will fail if there is no start-tagged node earlier in the node list.

Results of start*here painter

3.5. here-end

This painter will paint from the node it is placed on up to the next end-tagged node.

There may be no start-tagged node between the painter and the next end-tagged node. If this is the case, the painter will fail. The "-" in the name symbolizes a direct, uninterrupted (by other tags) sequence of nodes between start and painter position.

The painter will fail if there is no end-tagged node later in the node list.

Results of here-end painter

3.6. here*end

This is the same as here-end, but it allows start-tagged nodes between it and the nearest following end-tagged node (see last example). The "*" in the name symbolizes a wildcard sequence of start-tagged nodes between painter node and next end-tagged node.

The painter will fail if there is no end-tagged node later in the node list.

Results of here*end painter

3.7. start-start

This painter paints from the nearest preceding start-tagged node to the next start-tagged node (not including the latter).

It fails if there is an end-tagged node in-between. It also fails if there is no start-tagged node.

Results of start-start painter

3.8. start*start

This is the same as the start-start painter, however end-tagged nodes between those marked with a start tag are allowed (see examples 7 and 8).

Results of start*start painter

3.9. end-end

This painter paints from the nearest preceding end-tagged node to the next end-tagged node (not including the former).

It fails if there is a start-tagged node in-between. It also fails if there is no following end-tagged node.

Results of end-end painter

3.10. end*end

This is the same as the end-end painter, however start-tagged nodes between those marked with an end tag are allowed (see examples 2 and 4).

Results of end*end painter

3.11. this

This painter only colors the node it is placed on.

This painter never fails.

Results of this painter

4. Grouping algorithm

Grouping is performed on the whole document tree in a bottom-up document order. It is performed individually for each element’s children. It is also performed in a defined color order that you can specify, i.e. colors are always processed in a defined order.

Grouping does take into account node start and end tags. This is necessary in order to support directly adjacent groups. If grouping was only based on contiguous coloring, adjacent groups would not be possible since the grouper would not know where to split contiguously colored nodes into groups. In this, tags live up to their original roles, that is start tags always start a new group on that respective node and end tags end the currently open group after that node.

The following sample graphics shows – for a single color – how grouping takes place in a specific painting/tagging situation:

Final grouping of a given node sequence with markers

Group 1 is delimited by the start tag of node #2.

Group 2 is delimited by the end tag of node #2.

Group 3 is delimited by the end tag of node #4.

Group 4 is delimited by the end tag of node #7.

Group 5 is delimited by the non-painted node #9.

Group 6 is delimited by the end of the node sequence.

When placing tags on nodes it is therefore important to always bear in mind that these tags will also govern the final grouping in situations where painted nodes are adjacent.

5. Examples

Follow some examples you may encounter in one or another form in your own grouping requirements:

5.1. Grouping by paragraph class

Suppose you want to group adjacent paragraphs that are of class "Note", because you want to group them using a note element.

The UPL code you should run before the grouper in the UPL processor should look like:

[element(uci:par) and @uci:class="Note"] {
  set-painter( note, {"this"} );
}

This will set a painter of color "note" and type this on all uci:par elements that are of class "Note". During painting, those nodes will be painted with the specified color, and during grouping all contiguously adjacent, likewise colored node groups will be grouped by an <uci:block uci:type="note">…</uci:block> element.

5.2. Grouping with a known start element

Suppose you want to group nodes where you know exactly which conditions must be met by a node to start a group, but you don’t know the end. What you additionally do know is which kind of nodes are certainly part of the group (if they exist).

Let’s say we have the following XML fragment of sibling nodes:

➊ <p>Some text.</p>
➋ <p class="example-title">Example</p>
➌ <p class="example-text">Fruits are:</p>
➍ <list>
  <item>apples</item>
  <item>bananas</item>
</list>
➎ <p class="example-text">All these can be bought at Miller’s.</p>
➏ <p>As you have seen,…</p>

In this example, you know that paragraphs of class "example-text" always are part of an example, and that an example is always started by a paragraph of class "example-title". You do not know more, i.e. there may be arbitrary elements in-between like the list element in the example.

A suitable UPL code to group the elements #2 to #5 could be:

[element(p) and @class="example-title"] {
  mark-start( example );
  set-painter( example, {"this"} ); /* optional, see below */
}
[element(p) and @class="example-text"] {
  set-painter( example, {"start-here"} );
}

What does this do?

First, a start tag with color "example" is set on node #2, along with a painter that only colors itself. This is necessary when an example is allowed to only consist of an "example-title"-paragraph. If you require an example to at least have one "example-text"-paragraph to be a valid example, don’t use the line of code marked optional in the above.

Then, a painter of color "example" is placed on node #3 that paints from the nearest preceding start tagged node of color "example" up to itself. On the list element (#4), no painter or tag is set. On node #5, we again set a painter of color "example" that paints from the nearest preceding start tagged node of color "example" up to itself.

This happens during the run of the UPL program in the UPL Tree-Processor module.

Now, it’s the grouper’s turn, and it is about to perform the grouping for the color "example". As we have seen above, the first thing it does is apply the painting through the painters. The painters execute in document order, one after the other, so you get the following sequence of painting and – finally – grouping:

First, painter P1 does its node painting. It is a this painter and therefore only paints the node it was placed on. Follows painter P2 of type start-here. Then finally, painter P3 starts painting. It is also of start-here type, and therefore paints from the nearest preceding start-tag up to the node it was placed on. Finally, the grouping G is created and nodes #2 to #5 are wrapped by a <uci:block uci:type="example">…</uci:block> element.

Note how the list node #4 is painted by painter P3 even though it has neither been tagged nor has a painter been placed on it. Instead of the list node, any number of nodes not known in advance could have been present between node #3 and #5, and they would have been automatically grouped into an "example". This is a very important fact to both keep in mind and utilize to your advantage, for example in documents that have no strict, dependable structure but where you must work with only few known node constellations.

But what if…? Sure you have asked yourself, "But what if some badly authored document contains an ‘example-text’-paragraph without a preceding ‘example-title’-paragraph?" Here, the precise definition of the painter types comes into play.

Let’s assume node #2 is removed from the above example sequence. In this case, painter P2 would be the first painter to be executed. It is of type start-here, which fails if no suitable start-tagged node is found – which is the case here: there is no start-tagged node at or earlier in the node sequence. P2 fails, and a painter failing means it does not paint anything. The same is true for painter P3, with the effect that no node gets painted at all if node #2 (i.e., a start-tagged node) does not exist. Consequently, no grouping will occur.

Maybe that is not what you want. Maybe you want semantics like, "If a start-tagged node exists, then use that. If, however, it doesn’t, then at least make the individual ‘example-text’-paragraphs groups." This is where the painter fallback types come in handy. For the above, you’d need to change the UPL code as follows:

[element(p) and @class="example-title"] {
  mark-start( example );
  set-painter( example, {"this"} ); /* optional, see below */
}
[element(p) and @class="example-text"] {
  set-painter( example, {"start-here", "this"} );
}

Note the added painter type this in the second rule. This has the effect that when the first painter type (start-here) fails, the next – this – is tried, which – as already described – only paints the node the painter was placed on. So if node #2 was missing in our example, with the new UPL code we’d make sure that at least the paragraphs of class "example-text" would get painted, and therefore grouped, either on their own as in our example or, if adjacent, as a whole.

More real-world examples will be posted as supplemental material on our website in form of tutorials and how-tos in the following weeks and months.

Chapter 11. XML Namespaces in upCast

The use of XML namespaces is a core concept of upCast. Namespaces are essential to the processing pipeline, since they allow the clash-free co-existence of user-defined attributes and elements with upCast’s automatically generated elements and attributes. Clear separation of element and attribute domains allows targeted, semantically clear selection and filtering of the rich information present in the internal tree at serialization time.

1. The upcast-internal namespace (uci)

namespace name

recommended prefix

http://www.infinity-loop.de/namespace/2006/upcast-internal

uci

All elements and attributes of the upCast Internal DTD are members of the http://www.infinity-loop.de/namespace/2006/upcast-internal namespace. The suggested namespace prefix is uci.

Besides the goal of avoiding name clashes, attributes are members of the upcast-internal namespace so that they can be put on any element in the internal tree, even if it is a non-upcast-internal element, and still be recognized easily as such.

2. The upcast-css namespace (css)

namespace name

recommended prefix

http://www.infinity-loop.de/namespace/2006/upcast-css

css

The upcast-css namespace with the name http://www.infinity-loop.de/namespace/2006/upcast-css has a recommended prefix of css. It is currently only used for attributes and does not contain any elements. It is essentially a virtual namespace, which means that elements of the internal tree pretend to have attributes in the upcast-css namespace, when in fact these attributes are not actually materialized on the elements for efficiency reasons, but are generated on the fly when queried.

The css namespace contains the current value of all properties at the context node that have been defined by either applying a class to an element or a manual style override. It is assumed that all properties are inherited, and that manual overrides take precedence over class application when occurring on the same node.

The upcast-css namespace contains CSS styling properties mapped to an attribute representation. Since not all CSS property names used in upCast are also valid XML attribute local names, the following translation applies:

CSS property name

virtualized attribute name

-ilx-name

css:ilx-name

othername

css:othername

The only time the virtual attributes in the upcast-css namespace are actually materialized is when an exporter wants to serialize the internal tree verbatim, e.g. for reference reasons.

Note

To export materialized attributes in the upcast-css namespace using the supplied XML Exporter, turn on its Explode CSS style info into real attributes option.

3. The upcast-cssoverride namespace (csso)

namespace name

recommended prefix

http://www.infinity-loop.de/namespace/2006/upcast-cssoverride

csso

The upcast-cssoverride namespace with the name http://www.infinity-loop.de/namespace/2006/upcast-cssoverride has a recommended prefix of csso. It is currently only used for attributes and does not contain any elements. It is essentially a virtual namespace, which means that elements of the internal tree pretend to have attributes in the upcast-cssoverride namespace, when in fact these attributes are not actually materialized on the elements for efficiency reasons, but are generated on the fly when queried.

The upcast-cssoverride namespace contains CSS styling properties mapped to an attribute representation. It contains only properties that have been brought into the tree by applying a manual, explicit, anonymous style property override at a certain node, usually by way of a style attribute with local style property settings. The properties available in the csso namespace on a specific context node consist of the union of all such properties having been applied either on the node itself or one of its ancestors in the described fashion, in order from document root to context node, unless they are identical in name and value with a property in the fully calculated cssc namespace on that node, in which case they are not added. (It is assumed that cssc properties are always inherited.)

Since not all CSS property names used in upCast are also valid XML attribute local names, the following translation applies:

property name

virtualized attribute name

-ilx-name

csso:ilx-name

othername

csso:othername

The only time the virtual attributes in the upcast-cssoverride namespace are actually materialized is when an exporter wants to serialize the internal tree verbatim, e.g. for reference reasons.

Note

To export materialized attributes in the upcast-cssoverride namespace using the supplied XML Exporter, turn on its Explode CSS style info into real attributes option.

4. The upcast-cssclass namespace (cssc)

namespace name

recommended prefix

http://www.infinity-loop.de/namespace/2006/upcast-cssclass

cssc

The upcast-cssclass namespace with the name http://www.infinity-loop.de/namespace/2006/upcast-cssclass has a recommended prefix of cssc. It is currently only used for attributes and does not contain any elements. It is essentially a virtual namespace, which means that elements of the internal tree pretend to have attributes in the upcast-cssclass namespace, when in fact these attributes are not actually materialized on the elements for efficiency reasons, but are generated on the fly when queried.

The upcast-cssclass namespace contains CSS styling properties mapped to an attribute representation. The cssc namespace contains only properties that have been brought into the tree by applying a named style class from an external stylesheet onto a node, usually by way of a style reference using the class attribute. The properties available in the cssc namespace on a specific context node consist of the union of all such properties having been applied either on the node itself or one of its ancestors in the described fashion, in order from document root to context node. It is assumed that cssc properties are always inherited.

Since not all CSS property names used in upCast are also valid XML attribute local names, the following translation applies:

property name

virtualized attribute name

-ilx-name

cssc:ilx-name

othername

cssc:othername

The only time the virtual attributes in the upcast-cssoverride namespace are actually materialized is when an exporter wants to serialize the internal tree verbatim, e.g. for reference reasons.

5. The upcast-cals namespace (cals)

namespace name

recommended prefix

http://www.infinity-loop.de/namespace/2006/upcast-cals

cals

The upcast-cals namespace with the name http://www.infinity-loop.de/namespace/2006/upcast-cals has a recommended prefix of cals. It is used to differentiate attributes on tables from similarly or likewise named attributes and elements in other table models like HTML or the internal table model. You can therefore already decide at the top-level cals:table element that you are dealing with a CALS table without having to infer this from the further descendant element structure.

6. The HTML namespace (html)

namespace name

recommended prefix

http://www.w3.org/HTML/1998/html4

html

The html namespace with the name http://www.w3.org/HTML/1998/html4 has a recommended prefix of html. It is used to differentiate attributes on tables from similarly or likewise named attributes and elements in other table models like CALS or the internal table model. You can therefore already decide at the top-level html:table element that you are dealing with a HTML table without having to infer this from the further descendant element structure.

7. The XLink namespace (xlink)

namespace name

recommended prefix

http://www.w3.org/1999/xlink

xlink

The XLink namespace with the name http://www.w3.org/1999/xlink has a recommended prefix of xlink. It is used to identify linking attributes on elements.

8. The XML namespace (xml)

namespace name

recommended prefix

http://www.w3.org/XML/1998/namespace

xml

The XML namespace with the name http://www.w3.org/XML/1998/namespace has a recommended prefix of xml.

9. The Variable Realm namespaces

In UPL, you can refer to variables and values in a specific realm using that realm’s namespace. For each realm, there is a corresponding namespace.

For details on UPL variable references, confer the UPL specification.

For details on upCast variable realms, see here.

realm

namespace name

recommended prefix

application

http://www.infinity-loop.de/namespace/upcast-realm/application

application

environment

http://www.infinity-loop.de/namespace/upcast-realm/environment

environment

pipeline

http://www.infinity-loop.de/namespace/upcast-realm/pipeline

pipeline

module

http://www.infinity-loop.de/namespace/upcast-realm/module

module

javaproperty

http://www.infinity-loop.de/namespace/upcast-realm/javaproperty

javaproperty

include

http://www.infinity-loop.de/namespace/upcast-realm/include

include

map

http://www.infinity-loop.de/namespace/upcast-realm/map

map

10. UPL Utility functions library namespace

namespace name

recommended prefix

http://www.infinity-loop.de/namespace/upl/utility-functions

util

upCast comes with a library of several UPL utility function definitions. To have those separate from your own function definitions, even if they may share the same local function name, all these functions are located in a specific namespace: http://www.infinity-loop.de/namespace/upl/utility-functions .

Note

The library of UPL utility functions is currently not documented as it is still work in progress and therefore not stable enough to be used reliably in your own projects.

Chapter 12. Recognized Java system properties

Some settings need to be made early in the startup process of upCast. In fact so early, that they can not be read with application-internal means, but need already be set and available when upCast starts running. To set those values in cases where their default is not desirable, you can pass them via Java system properties to the JVM running the upCast application.

The following parameters are available, with their defaults, which are sometimes calculated dynamically based on the system/OS the application is running on, given as well:

de.infinityloop.exe.location (Windows only)

Default: ${application:BundledResources}/EXEs

Specifies the folder where upCast will look for supporting .exe files (like il-gw.exe, used for WordLink).

de.infinityloop.application.location

Default: (application installation root)

The folder where the application’s installation root lies.

de.infinityloop.application.preferencesdir

Default: (system dependent)

The folder where upCast will write its preferences file to.

de.infinityloop.application.logfile

Default: (system dependent)

The file where upCast will write its external logfile. Whether this value is actually used is dependent on the log subsystem chosen.

de.infinityloop.application.logsize

Default: 8388608 (bytes, i.e. 8 MB)

The maximum size for the application log file. When this size is reached, the log file is automatically cleared and filled with new log entries.

de.infinityloop.loglevel

Default: 3

Set to a number greater or equal 0 identifying the threshold below which messages will be output to the log subsystem. The currently used range is 0..7, with 7 being the highest level debugging, i.e. "output always". To have verbose logging, set this to a high value. To get reduced logging info, reduce the number.

The default value 3 corresponds to INFO type messages (and above).

Important

When running a pipeline using the de.infinityloop.upcast.RunPipeline class, use the -debug logfilterexp option described there instead of the de.infinityloop.loglevel property.

de.infinityloop.logfilterspec

Default: (empty)

This property can be used for specifying the log event filter used at the interface to the external logging system (usually a file or the console). Additionally, it is used in some selected places within upCast’s code base to prevent the time-consuming creation of complex log events already at their originating place.

The filter expression syntax is the same as described here.

Example 12.1. 

-Dde.infinityloop.logfilterspec=+ERROR,+FATAL,+INFO

only passes messages of type ERROR, FATAL and INFO to the external logging subsystem, but not WARN messages.


Note

At this time, the only supported message constant preventing log message generation already at its origin is CurrentRTFToken.

Chapter 13. Commandline Interface

upCast offers a convenient helper class for running pipeline documents from the commandline. It also allows you to pass parameter values to the pipeline if they have been defined in the Pipeline Settings > Pipeline Parameters tab.

1. How it works

The commandline pipeline document interpreter class reads the specified pipeline document and looks for all defined pipeline parameters in it:

  • If a parameter has the property required set to true, a value for it must be specified on the commandline. If no value is specified, the execution is stopped and an appropriate error message is output to the console.

  • If a parameter does not have the required property set to true, and if no value for it is specified in the commandline call, and if it has its default property specified, that specified value is set.

  • If a parameter does not have the required property set to true, and if its default property has not been specified, and if no value for it is specified in the commandline call, then this parameter will not be set at all. Trying to retrieve the value for such a parameter during pipeline execution will result in an error to the effect that the requested parameter resp. pipeline variable is undefined.

  • If a parameter is specified on the commandline that is not defined as a parameter in the pipeline, an error is issued to the console and execution is halted.

After these checks, the parameter values that are defined will be set as variables in the pipeline realm (similar to how is the case when running the pipeline in Simple View mode), and then modules will be executed in the order as defined in the pipeline document.

Important

A pipeline document to be run by the commandline interface must be self-contained, i.e. it must explicitly specify

  • its license file

  • catalogs to be used

  • font configuration definitions or overrides

  • any custom encodings to be used

You should make sure that pipeline documents intended to be run via the commandline do not have their Use application settings checkbox checked on their Pipeline Settings > Catalogs, Pipeline Settings > Font configuration, Pipeline Settings > Encodings and Pipeline Settings > License tab.

Note that by default, upCast’s built-in templates have this checkbox checked!

Note

For parameters of type popup, the internal value (from the internal-values property list) must be passed in as the parameter value, not the displayed value.

2. Synopsis

java –classpath upcast.jar de.infinityloop.upcast.RunPipeline parameters...

with parameters being:

[0]

absolute path to the pipeline document to be run

[1..n]

standard options

Standard options are as follows:

-p <name> <value>

set the pipeline parameter name to the value value

-debug <N> | <filterexpr>

turn on debug output for the conversion with specified level of verbosity with N being a number between 0 (least verbose) and 7 (annoyingly verbose). Alternatively, you can specify a filter expression (as string) that follows the log filter expression syntax as defined here.

-version

display upCast version information

-help

show help on the defined parameters for the specified pipeline document

-catalog <file>+

one or more (XML-) catalog files to set up as global upCast catalogs before further processing. Setting this option is essential when using parameter sets (*.ucpar) that rely on resolving their PUBLIC identifier to find the corresponding pipeline implementation file.

-license <url>

override the license specified in the pipeline with this specified one (absolute URL path)

To view a current synopsis of the implementation of the commandline interface, issue

java -cp upcast.jar de.infinityloop.upcast.RunPipeline

Referring to the embedded evaluation license

The application jar (upcast.jar) contains an embedded evaluation license.

Since the commandline has no knowledge about the system-dependent preferences file storage location, you need to always specify the license manually when calling the CLI (or Java API) unless you have specified the license to use explicitly in the pipeline settings (which is recommended for any pipeline intended to be run via CLI or Java API).

The path to use for the embedded evalulation license is as follows:

jar:file:!/de/infinityloop/upcast/resources/licenses/upcast-eval.uclicense

When using the CLI, add the parameter

-license "jar:file:!/de/infinityloop/upcast/resources/licenses/upcast-eval.uclicense"

to your call.

3. Exit codes

The RunPipeline class calls Java's System.exit() function after running the pipeline. It returns a numeric exit code. This exit code can either be one of the upCast-reserved exit codes, or be a exit code the pipeline itself sets.

Important

Exit codes in the range from 0 to 99 are reserved for upCast's own use.

Custom exit codes should be in the range from 100 to 250.

3.1. upCast exit codes

The following exit codes are currently used by upCast:

0

SUCCESS: the pipeline was successfully processed

1

GENERALERROR: some general error occurred during pipeline processing; for details, see the log file entries

2

PIPELINENOTFOUND: the pipeline document to be run was either not found or cannot be read; check the path to the file ans whether it is readable by the user that runs upCast

3

ARGUMENTERROR: the number or types of arguments passed does not match the pipeline document's parameter definitions

3.2. Custom exit code pipeline:ModuleResult

To return a custom exit code, you must make sure that the pipeline variable ModuleResult of the top-level pipeline is set to a corresponding value. That value must be castable to an integer in the range from 0 to 255.

By default, ModuleResult contains the value last set by a module in execution order. You can override that value – if required – in the top-level pipeline's custom finalization function finalize() by writing the value to the $pipeline:ModuleResult variable explicitly.

Chapter 14. Java API (low-level)

Important Note

It is strongly recommended to use the detailed Java API (described in this chapter) only when your requirements do not let you use the static de.infinityloop.upcast.UpcastEngine.runPipeline() method. This is the case e.g. when you must dynamically select and parameterize the modules and their sequence of execution, or must react specifically in Java code on error conditions after each single module execution.

If you do not absolutely need these fine-grained control capabilities, which is usually the case when you can set up and run the pipeline you need using just the upCast GUI, please do not use the low-level API described in the following. Just use the de.infinityloop.upcast.UpcastEngine.runPipeline() method with the pipeline or parameter set file you developed in the GUI, instead. This makes changes to the pipeline possible without the need for re-compilation and therefore maintenance so much easier…!

Complete sample code (just a few lines of Java) ready for copy and paste into your Java project for each individual pipeline using de.infinityloop.upcast.UpcastEngine.runPipeline() can be obtained from the pipeline documentation in HTML format you get from File > Generate Documentation….

1. Concepts

Accessing upCast functionality is carried out via one instance of a broker object: UpcastEngine. You should create one instance of that object at startup and use it for many subsequent conversions, since creation of this object is rather expensive. There are no problems in reusing that object for subsequent conversions (in contrast e.g. to many XML parser implementations, for example) – to the contrary, it is highly recommended from a performance point of view.

You may create several instances of the UpcastEngine object in order to run multiple conversion threads at the same time in your single application. Please note that the maximum number of parallel threads may be restricted by your license.

2. Using the API

We assume that you are familiar with Java programming and its concepts like objects, interfaces and implementations. You should also be fluent with the Java object notion and with Java Streams.

The javadoc API reference can be found here.

2.1. General programming steps

The general programming steps are as follows:

  1. Instantiate a de.infinityloop.upcast.UpcastEngine object. You can think of this object as the interface to your pipeline.

  2. Set the pipeline base URI using the setPipelineBaseURI() method.

  3. Register that instance with an appropriate license file using its setLicense() method.

  4. Set global pipeline parameters like catalogs to use, overrides to the standard font configuration and custom encodings to use via the appropriate instance methods.

  5. Call the initializeConversion() method.

  6. Set pipeline variables using the setPipelineVariable() method.

  7. Choose a module class via the method setModuleType(), which then internally gets instantiated and becomes the current module.

  8. Set module parameters using (possibly repeated calls to) the setModuleParameter() method.

  9. Start the module execution by calling runModule().

  10. (optional) Repeat from step 7 for subsequent modules in the desired pipeline.

  11. Call the cleanupConversion() method.

  12. (optional) Repeat from step 5 for converting another document.

Expressed in actual Java code, this might look something like this:

String moduleID = null;
UpcastEngine ucInst = new UpcastEngine( "instance one" );
ucInst.setPipelineBaseURI( "file:///path/to/basefolder/" );
ucInst.setLicense( "file:///path/to/upcast.uclicense" );
ucInst.setPipelineVariable( "DestinationFolder", "/test/out/" );
ucInst.setPipelineVariable( "ImageDestinationFolder", "/test/out/" );
ucInst.initializeConversion();
  moduleID = ucInst.setModuleType( UpcastEngine.kRTFImporterType );
  ucInst.setModuleParameter( moduleID, "OrigNumbering", Boolean.TRUE );
  ucInst.setModuleParameter( moduleID, "SourceFile", "/test/in/in.rtf" );
  ucInst.runModule( moduleID );
  moduleID = ucInst.setModuleType( UpcastEngine.kXMLExporterType );
  ucInst.setModuleParameter( moduleID, "DeleteEmpties", Boolean.FALSE );
  ucInst.setModuleParameter( moduleID, "DestinationFile", "${pipeline:DestinationFolder}/out.xml;" );
  ucInst.runModule( moduleID );
ucInst.cleanupConversion();

Tip

To quickly construct a slightly more sophisticated Java source code template for a pipeline you have already built using the GUI, use the File > Export to Java source command. You can then modify this generated code to your liking, preferably by subclassing it and overriding methods where needed.

2.2. Setup

2.2.1. Create an UpcastEngine instance

You gain access to all functionality of upCast by means of objects of a single class: UpcastEngine. An instance of this object is what you will use in your application in order to access the full range of upCast API functionality.

Before you can do anything with upCast, you need to instantiate a UpcastEngineobject:

UpcastEngine ucInst = new UpcastEngine( "instance one" );

The UpcastEngine class is to be found in the de.infinityloop.upcast package.

You should keep this object stored in a variable which you can access from all places inside your program where you need to access upCast functionality.

You should strive to have only one instance of the UpcastEngine object per physical CPU at any time for performance reasons. Also make sure you only instantiate this object once during the life of your application process, as instantiating and disposing of this object is a relatively costly operation.

2.2.2. Setting the pipeline:base property

In the GUI version of upCast, this proeprty is set automatically for you, as there is a pipeline document that determines this value. In the API, however, there is no such document, so you must tell the upCast pipeline processor the value of this property. It serves as basis for resolving any ${pipeline:base} references you might have in module parameter values or pipeline setting values.

ucInst.setPipelineBaseURI( "file:///path/to/basefolder/" );

This should be called immediately after creating the UpcastEngine object instance.

2.2.3. Setting the license

To use upCast in API mode, a license file is required that includes either or both of the rtfimporter-api and rtfexporter-api features. If in doubt, contact us at licensing@infinity-loop.de.

License features encoded in a *.uclicense upCast license file can be reviewed by opening the license file in upCast (using File > Open... or by double-clicking the license file (Windows and Mac OS X only)).

To set the license, use:

ucInst.setLicense( "file:///path/to/upcast.uclicense" );

Referring to the embedded evaluation license

The application jar (upcast.jar) contains an embedded evaluation license.

When running via the Java API, upCast has no knowledge about the system-dependent preferences file storage location. Therefore, you need to always specify the license explicitly when using the Java API.

The path to use for the embedded evalulation license is

jar:file:!/de/infinityloop/upcast/resources/licenses/upcast-eval.uclicense

When using the Java API, you'd then use this line of code to set the evaluation license:

ucInst.setLicense("jar:file:!/de/infinityloop/upcast/resources/licenses/upcast-eval.uclicense");

2.2.4. Setting pipeline properties

You can set pipeline properties directly on the UpcastEngine instance object. This includes amending or overriding the font configuration (setCustomFontConfiguration()), adding catalog files to be used by XML processing (addCatalog(), discardCatalogs()), and adding custom encodings to the set of built-in ones (addCustomEncoding()). These settings remain valid as long as the UpcastEngine instance lives or until you explicitly clear or set them to different values.

(Do not confuse these settings with the setting of pipeline variables; see below.)

2.3. Building and running a pipeline

Whereas in the GUI, you build a static pipeline by choosing a specific sequence of modules, the API handles a pipeline differently. In fact, there is no concept of a pre-built pipeline setup to be run; instead, you run modules one at a time. This has the great advantage that you can dynamically and programmatically build the actual pipeline for each single conversion, e.g. based on results of a preceding module execution on that input source.

2.3.1. Bracketing a pipeline by initializeConversion() and cleanupConversion()

ucInst.initializeConversion();
  /* ... your pipeline code goes here ... */
ucInst.cleanupConversion();

Since upCast has to do some housekeeping for each conceptual pipeline run (independent of the actual number and sequence of modules run within), you need to tell it when you conceptually start a pipeline for a specific input file, and when you are done with it, i.e. when you have run the last module for this specific input file. This is done by the initializeConversion() and cleanupConversion() methods.

For example, initializeConversion() cleans the pipeline variable realm so that subsequent pipeline runs do not see values set by a previous run. And cleanupConversion() makes sure any temporary files created by some module get properly deleted when they are no longer needed.

Important

It is very important that you obey this pipeline bracketing rule at all times, as strange, non-deterministic behaviour may occur otherwise.

2.3.2. Setting pipeline variables

As in the GUI (by way of the Pipeline Variables module), you can set variables in the pipeline realm to be used by modules run subsequently. The method to use is setPipelineVariable(), e.g.:

ucInst.setPipelineVariable( "DestinationFolder", "/test/out/" );

Note

The pipeline variable realm is cleared by a call to initializeConversion(). You therefore must explicitly (re-)set them at the beginning of a new conversion pipeline execution for a document.

2.3.3. Setting up and running a module

Each module to be run has to be set up individually. This is done in three general steps:

  1. Choose and set the module class to use.

  2. Set module parameters.

  3. Run the module.

First, you choose from one of the available module classes and set that using the setModuleType() method:

moduleID = ucInst.setModuleType( UpcastEngine.kRTFImporterType );

This will create a new instance of this module type and set it as the current module.

The constants to be used (in the UpcastEngine class) for the available module types are:

Module Type

Java constant name

Pipeline Variables

kPipelineVariablesProcessorType

RTF Importer ("upCast")

kRTFImporterType

UPL Processor

kUPLProcessorType

UPL Tree Processor

kUPLTreeProcessorType

Sectioner

kSectionProcessorType

XML Exporter

kXMLExporterType

Commandline Processor

kCommandlineProcessorType

XSLT Processor

kXSLTProcessorType

Unicode Translation Processor

kUnicodeTranslationProcessorType

XML Validator

kValidationProcessorType

CSS Exporter

kCSSExporterType

RTF Exporter ("downCast")

kRTFExporterType

XML Importer

kXMLImporterType

External Pipeline Processor

kExternalPipelineProcessorType

Module parameters will be set to defaults. The call will return a handle (a String) to that module which you must pass to subsequent setModuleParameter() calls:

ucInst.setModuleParameter( moduleID, "SourceFile", "/test/in/in.rtf" );

Note

The default parameter setting of modules is not documented. Though usually reasonable, these may change from release to release. We therefore highly recommend to set all parameters of a module explicitly to the desired values in order to not have your code break at an upCast update.

Tip

Like in the GUI version of upCast, you can use variable references in the parameter values which will be resolved by upCast automatically.

Example 14.1. 

To specify the source file relatively to the pipeline base directory (to whatever value it is currently set), use a line like

ucInst.setModuleParameter( moduleID, "SourceFile", "${pipeline:base}/source/in.rtf" );

Parameter names for each module are given in the description of the individual modules earlier in this manual. The parameter value has to be passed as a Java Object. The required object class depends on the specific parameter and is documented for each available parameter.

If you set a parameter more than once, the last value set will be used.

To set several parameters, you need to repeatedly call the method setModuleParameter().

Note

If you try to set a parameter that is not supported by the current module, the parameter simply will have no effect, but no error is reported. To track which parameters you set in your application, you should turn on debug logging.

Important

If you use a different Java Object (sub-)class for the parameter value than specified in the reference section, the behaviour is undefined. Some types may be compatible, but in general you will get a Java exception at some point later in the execution of upCast or the operation will not work the way you intended.

Finally, you’ll want to kick off the module’s execution. This is done by the runModule() method:

ucInst.runModule( moduleID );

After this, you could either setup the next module exactly as described in this section so far. You could even base the selection of the next module on the value of some pipeline variable which the module might have set to some specific value or some other condition. You can query the values of variables in the pipeline realm using the getPipelineVariable() method.

3. Connecting to WordLink (Windows only)

To access WordLink functionality also from upCast running via the API, you need to tell it where the WordLink component il-gw.exe is to be found before you instantiate an UpcastEngine object. This is done by setting the system property de.infinityloop.exe.location to the folder where il-gw.exe resides:

System.setProperty( "de.infinityloop.exe.location", "/path/to/il-gw-folder/" );

On a typical Windows installation, this is C:\Program Files\infinity-loop\upCast\Resources\EXEs , but you are free to move the application file il-gw.exe anwhere in your filesystem where it is convenient for your deployed application.

Important

Using WordLink in a critical server-based unattended environment is not supported and therefore not recommended. WordLink uses an installed copy of Word in component mode. Such use is explicitly warned against by Microsoft for server or server-like applications for technical reasons (letting alone any remaining licensing issues).

3.1. Accessing Word as COM object in a restricted environment

WordLink must access and launch Word to do what it needs to do. However, when running in a server environment, rights of running processes are usually tightly restricted. For example, Word might not be allowed to be accessed by the server process as COM object.

To make WordLink work in such restricted environments, you need to explicitly grant the user running the server access to the Word COM object. You can check and do this as follows:

  • On the Windows commandline, start dcomcnfg.exe .

  • Choose the component "Microsoft Word Document" (or similar, depending on localization) and click Properties... .

  • Under Security > Use custom launch permissions, add the account that runs the server using Edit...Add... . (On one of our machines, this e.g. was "ASPNET (ASP.NET Machine Account)").

After this modification, WordLink should also work in the restricted environment.

4. API Error handling

During a single call to an API method, several problems may occur, some of them quite significant, some of them less significant. In every case, the method will throw a single UpcastException. An UpcastException is a special descendant of a java.lang.Exception that encapsulates a list of errors and/or warnings that occurred during the last call to an API method.

You can query an UpcastException for its single constituents, which are objects of type LogEvent. A LogEvent encompasses:

  • a numerical message code

  • a message class, one of: FATAL, ERROR, WARN, INFO, DEBUG, VERBOSE, DETAIL

  • a human readable message as String

  • a (possibly null) array of parameters that were used in constructing the message

4.1. Coding pattern

The recommended coding style for error handling is to wrap each call to an API method in its own try{}/catch{}-block and catching UpcastExceptions explicitly. This is useful if e.g. the runModule() call throws an exception, but the severity is not high and you decide to continue processing because it only contains a warning that you do not care about and that does not affect the document integrity. By wrapping each call separately, you get the maximum out of any sequence of API calls by just skipping the portions that did not work.

A typical API call including error handling would look something like:

try {
  ucInst.runModule( moduleID );
} catch( UpcastException e ) {
  if( e.extractSignificantEntries( 
      new int[] { 
        LogEvent.FATAL, 
        LogEvent.ERROR 
      }, 
      null, 
      null ).size() > 0 ) { // we only react on FATAL or ERROR types, but not WARNings
    ... do some error handling ...
  }
}

4.2. Error handling tidbits

Using the extractSignificantEntries() method you can specify in very high detail in what messages you are interested in. For more information on how to use this method, see the javadoc API reference.

The message codes are all constants of a special class, Msg. See the javadoc API reference for a description of the currently defined message codes and the number and semantics of parameters available for a specific message.

Chapter 15. upcast-runner Ant task

upCast’s distribution jar includes an Ant task that lets you run upCast pipelines from files (*.ucdoc) from within Ant. This has the advantage that usually, you do not have to create the Ant task code anew whenever you make minor to moderate changes to your pipeline. To use it, you have to first define the task for use by Ant, then create the correct sub-structure of the upcast task.

Tip

To quickly construct an Ant build file code template for the upcast-runner task, use the File > Export to Ant > using 'upcast-runner' Task command. You can then modify this generated code to your liking or include it into an existing build file.

1. Defining the upcast-runner task

To define the task, use the following code:

<taskdef
    classname="de.infinityloop.upcast.ant.UpcastRunnerAntTask"
    name="upcast-runner"
    classpath="upcast.jar"
/>

For upcast.jar, you must specify the path to the distributed upcast.jar file. E.g., if you have a specific tasks folder next to your build file, you should copy the upcast.jar file there and specify ${basedir}/tasks/upcast.jar.

2. Structure of the upcast-runner task

<upcast-runner
    file="/path/to/pipeline.ucdoc"
    logfilter="DEBUG"
    sourceparam="SourceFile"
>
    <source dir="...">
        <include name="pattern" />
        …
    </source>

    <catalogs>
        <catalog file="..." />
    </catalogs>

    <param name="..." value="..." />
    …

</upcast>

On the upcast-runner task itself, some global parameters need to be set, above all file, which is the absolute path to the pipeline to run by this task.

The upcast-runner task can contain the following elements as nested elements: source (to set the source files; see below for the exact semantics), catalogs (to specify global catalog files to use; needed to resolve the PUBLIC ID of the pipeline in case the task is used to run a parameter set (*.ucpar)), and one or more param elements setting the pipeline's public parameters.

We’ll discuss each of these elements in more detail in the following.

2.1. upcast-runner

2.1.1.  Description

This is the root element of the upcast-runner Ant task.

2.1.2. Parameters

Attribute

Description

Required

file

The absolute path to the pipeline or parameter set file to run.

yes

logfilter

The filter specification for emitted logging events. The filter specification syntax is described here.

The default is ERROR.

no

logfile

The absolute path of a file to write logging output to.

When not specified, upCast's default log file is used at {Log File}.

no

sourceparam

contains the name of the pipeline variable that should be set to an item as selected by the source element in turn (for batches). The default variable name used when this attribute is not specified is SourceFile.

no

2.1.3. Nested elements

The upcast-runner element supports the following nested elements:

  • source

  • licensefile

  • catalogs

  • param

2.2. source

2.2.1. Description

The source element is a very special element. Normally, an upCast pipeline does not inherently support batching, because a pipeline is described only in terms of a specified, single source file (though this might be different for each module). To make this single source file easily available, the Ant task makes use of the upCast parameter system and allows you to pre-set the pipeline:SourceFile variable (or a differently named variable if the sourceparam attribute on the upcast task element is used).

The source element is the equivalent to setting this variable. This works as follows:

  1. For each file selected by any nested source element in an upcast task, the pipeline as described by the upcast-runner element is executed.

  2. In this, the current file of an iteration is set as the pipeline variable SourceFile (the default, or the different variable name optionally specified by the sourceparam attribute on the upcast-runner task element) and is then available as ${pipeline:SourceFile} (resp. ${pipeline:variablename}) for all modules in the pipeline.

Note

Don’t forget to quote upCast variable references in an Ant build-file so that Ant does not try to resolve these variable references (which are otherwise very similar in syntax). See the param element description below for details and an example.

The source element is of the Ant core type fileset as described in the Ant manual, where you will find all attributes and allowed nested elements described in detail.

2.2.2. Parameters

The source element has the same attributes as the fileset Ant core type.

2.2.3. Nested elements

The source element has the same nested elements as the fileset Ant core type.

2.3. licensefile

This optional element designates the upCast license file to use. If you specify a license explicitly using this element, it overrides any license setting made directly in the pipeline (or parameter set) document to be run.

2.3.1. Parameters

The licensefile element has the following attribute:

Attribute

Description

Required

file

The full, absolute path to the license file. You can use relative addressing based on the pipeline base directory using "$${pipeline:base}…" notation.

yes

2.4. catalogs

This element is grouping all catalog elements.

2.4.1. Nested elements

The catalogs element supports the following nested elements:

  • catalog

2.5. catalog

This element designates an OASIS or XML catalog file.

2.5.1. Parameters

The catalog element has the following attribute:

Attribute

Description

Required

file

The full, absolute path to the catalog file.

You cannot use relative addressing based on the pipeline base directory on this parameter, as it is evaluated even before the pipeline document is read and the context is established.

yes

2.6. param

2.6.1. Description

Used to set a pipeline parameter. The parameter names available for use depends on the pipeline definition. To learn the parameters, we recommend to run File > Generate Documentation… on the pipeline this task should execute, which generates a documentation of the pipeline, including the parameters and value ranges it supports.

2.6.2. Parameters

Attribute

Description

Required

name

The parameter’s name.

yes

value

The parameter’s value.

yes

Chapter 16. upcast Ant task

upCast’s distribution jar includes an Ant task that lets you run upCast pipelines from within Ant. To use it, you have to first define the task for use by Ant, then create the correct sub-structure of the upcast task.

Tip

To quickly construct an Ant build file code template for a pipeline you have already built using the GUI, use the File > Export to Ant > as self-contained Task… command. You can then modify this generated code to your liking or include it into an existing build file.

We recommend that instead of this upcast task, you should use the upcast-runner task whenever possible for your convenience.

1. Defining the upcast task

To define the task, use the following code:

<taskdef
    classname="de.infinityloop.upcast.ant.UpcastAntTask"
    name="upcast"
    classpath="upcast.jar"
/>

For upcast.jar, you must specify the path to the distributed upcast.jar file. E.g., if you have a specific tasks folder next to your build file, you should copy the upcast.jar file there and specify ${basedir}/tasks/upcast.jar.

2. Structure of the upcast task

<upcast
    basedir="/pipeline-base/"
    name="instancename"
    sourceparam="SourceFile"
>
    <source dir="...">
        <include name="pattern" />
        …
    </source>
    <settings>
        <licensefile file="..." />
        <logging file="..." filter="..." />
        <catalogs>
            <catalog file="..." />
            …
        </catalogs>
        <encodings>
            <encoding file="..." />
            …
        </encodings>
        <parameters>
            <parameter name="..." value="..." />
            …
        </parameters>
    </settings>
    <pipeline>
        <module type="..." name="...">
            <param name="..." value="..."/>
            …
        </module>
        …
    </pipeline>
</upcast>

On the upcast task itself, some global parameters need to be set, above all basedir (which directly translates to upCast’s ${pipeline:base} variable and is accessible as such in parameter values).

The upcast task can contain the following elements as nested elements: source (to set the source files; see below for the exact semantics), settings (for some essential pipeline settings), and pipeline (describing the pipeline to run).

A pipeline consists of an ordered sequence of module elements (corresponding to the standard upCast modules), each of which can have any number of param elements (to set module parameters).

We’ll discuss each of these elements in more detail in the following.

2.1. upcast

2.1.1.  Description

This is the root element of the upCast Ant task. It encapsulates a complete upCast pipeline with associated parameters.

2.1.2. Parameters

Attribute

Description

Required

basedir

This is the equivalent to the ${pipeline:base} pipeline variable when running in the GUI or setting it via the API call

setPipelineBaseURI("...");

You can access this base directory in parameter values that support variable extension using ${pipeline:base} .

yes

name

Sets a descriptive name for this running task instance that is used e.g. in log output.

no

sourceparam

contains the name of the pipeline variable that should be set to an item as selected by the source element in turn (for batches). The default variable name used when this attribute is not specified is SourceFile.

no

2.1.3. Nested elements

The upcast element supports the following nested elements:

  • source

  • settings

  • pipeline

2.2. source

2.2.1. Description

The source element is a very special element. Normally, an upCast pipeline does not inherently support batching, because a pipeline is described only in terms of a specified, single source file (though this might be different for each module). To make this single source file easily available, the Ant task makes use of the upCast parameter system and allows you to pre-set the pipeline:SourceFile variable (or a differently named variable if the sourceparam attribute on the upcast task element is used).

The source element is the equivalent to setting this variable. This works as follows:

  1. For each file selected by any nested source element in an upcast task, the pipeline as described by the pipeline element is executed.

  2. In this, the current file of an iteration is set as the pipeline variable SourceFile (the default, or the different variable name optionally specified by the sourceparam attribute on the upcast task element) and is then available as ${pipeline:SourceFile} (resp. ${pipeline:variablename}) for all modules in the pipeline.

Note

Don’t forget to quote upCast variable references in an Ant build-file so that Ant does not try to resolve these variable references (which are otherwise very similar in syntax). See the param element description below for details and an example.

The source element is of the Ant core type fileset as described in the Ant manual, where you will find all attributes and allowed nested elements described in detail.

2.2.2. Parameters

The source element has the same attributes as the fileset Ant core type.

2.2.3. Nested elements

The source element has the same nested elements as the fileset Ant core type.

2.3. settings

This element is grouping all settings that pertain to the whole pipeline execution environment (not just individual volumes. It has no attributes.

2.3.1. Nested elements

The upcast element supports the following nested elements:

  • licensefile

  • logging

  • catalogs

  • encodings

  • fontconfig

  • parameters

2.4. licensefile

This element designates the upCast license file to use.

2.4.1. Parameters

The licensefile element has the following attribute:

Attribute

Description

Required

file

The full, absolute path to the license file. You can use relative addressing based on the pipeline base directory using "$${pipeline:base}…" notation.

yes

2.5. logging

This element specifies the log file destination (if you are using upCast’s logging setup unchanged) and the logging filter specification.

2.5.1. Parameters

The logging element has the following attribute:

Attribute

Description

Required

file

The full, absolute path to the log file.

no

filter

The filter specification for emitted logging events. The filter specification syntax is described here.

The default is INFO.

no

2.6. catalogs

This element is grouping all catalog elements.

2.6.1. Nested elements

The catalogs element supports the following nested elements:

  • catalog

2.7. catalog

This element designates an OASIS or XML catalog file.

2.8. encodings

This element is a wrapper for one or more encoding elements.

2.8.1. Nested elements

The encodings element supports the following nested elements:

  • encoding

2.9. encoding

The encoding element is used to tell the upcast task about any custom encoding files you are using. The encoding element is of the Ant core type fileset as described in the Ant manual, where you will find all attributes and allowed nested elements described in detail.

2.9.1. Parameters

The encoding element has the same attributes as the fileset Ant core type.

2.9.2. Nested elements

The encoding element has the same nested elements as the fileset Ant core type.

2.10. fontconfig

This element lets you specify a font configuration override. The override source code is specified as this element’s child text content.

This element has no attributes.

2.11. parameters

This element is a wrapper for one or more param elements. The param element in this context (as descendant of the settings element) specifies any parameters set and defined via the Simple View. The set of pipeline variables is pre-populated with these values before the pipeline element and its module children are evaluated.

Any implicit variable setting by a source element contained within the upcast task is performed after having set all of the param elements contained within this parameters element.

2.11.1. Nested elements

The parameters element supports the following nested elements:

  • param

2.12. pipeline

2.12.1. Description

The pipeline element specifies and groups the specification of the ordered execution of modules.

2.12.2. Parameters

The pipeline element has no attributes.

2.12.3. Nested elements

A pipeline must have at least one nested module element.

2.13. module

2.13.1. Description

A module element specifies the type and (via its param nested elements) the configuration of an upCast module to be run.

2.13.2. Parameters

Attribute

Description

Required

type

The module’s class ID. One of:

  • pipelinevars

  • rtfimport

  • upl

  • uplcode

  • sectioner

  • grouper

  • xmlexport

  • css

  • commandline

  • unicodetranslator

  • validator

  • rtfexport

  • xslt

  • xmlimport

  • extpipeline

yes

name

Can be used to set a descriptive name for the module resp. its intended usage.

This is equivalent to the module’s Name parameter.

no

2.13.3. Nested elements

A module element can have any number of nested param elements.

2.14. param

2.14.1. Description

Used to set a module parameter. For a description of the parameters available for a specific module, see its description earlier in this document.

2.14.2. Parameters

Attribute

Description

Required

name

The parameter’s name.

yes

value

The parameter’s value. You can use the usual upCast variable references in the value.

Note

To differentiate an Ant variable reference (which is expanded by the Ant runtime before being passed as the actual value to the code backing the param element) from upCast variables which are resolved by the upcast task’s implementation, you need to quote the leading $ sign. This e.g. means that

<param name="P" value="${SourceFile}"/>

will set the value of the parameter P to the contents of the Ant property SourceFile, whereas

<param name="P" value="$${SourceFile}"/>

will set the value of the parameter P to the string value "${SourceFile}" (which will be resolved by upCast’s built-in variable resolver from upCast’s variable pool).

yes

Chapter 17. Logging Architecture

Schema of upCast’s logging architecture

upCast uses an event-based logging system, which also serves a second purpose, namely transporting warning and error states through the system. There are several hooks where you can intercept and react to those states (= log events) in a programmatic way.

The basic concepts in this architecture are Log Event Sources, which generate a new log event, Log Event Processors, which evaluate, process, (possibly) forward and programmatically change or filter generated log events and Log Event Filters, which discard log events you are not interested in and therefore serve to conserve storage space and increase performance.

1. Log Event processing

Each component in upCast, which can be a module or pipeline, has its own Log Event Source and Log Event Processor.

The Log Event Source sends a freshly created Log Event down to two distinct paths: the global Log Event path and the component's Log Event path.

1.1. Global Log Event path

The global Log Event path cannot be intercepted or filtered within a component, meaning that all events ever created are sent down this path and will show up in the Central Logging Hub. There, the Log Events of all running component instances and the application are merged into a single stream of Log Events that any number of Log Writers can attach to. Each Log Writer defines its own, independent Log Event Filter (if desired).

Two Log Writers are pre-defined: The Live Log Window and upCast's global log file.

1.1.1. upCast's global log file

The global application log file is created by a pluggable logging system. For the abstraction layer, upCast uses Log Bridge by Graham Lea. The default logging system implementation under this bridge in upCast is a slightly modified binding to java.util.logging, the Java 1.4 logging system implementation. The log filter used for the log file is controlled by the setting of the Log filter parameter in the upCast Preferences window. When running in API mode where the UI preferences are not available, you can pass the Log Event Filter setting using the Java system property de.infinityloop.logfilterspec. Additionally, you can set the log file location using the Java system property de.infinityloop.application.logfile.

1.1.2. Live Log Window

The Log Events available in live log mode of the logging window are all the events generated in the system, which are filtered for display by the respective setting in that window.

1.1.3. Custom Log Writers

In addition to those pre-defined Log Writers, you can programmatically create (via UPL's create-log-writer() function) any number of custom Log Writers with their own Log Event filter defined that – in contrast to the pre-defined Log Writers – also includes a context. The context is the component instance that was executing when (and wherefrom) the Log Writer was programmatically created. This allows you to filter Log Events not only by level and message code, but also by module or pipeline instance they originated from. This makes those custom Log Writers ideal for creating specialized log files that only cover e.g. a single conversion session or even just a detailed log of a single module's execution for debugging purposes.

1.2. Component's Log Event path

The Log Event Processor is situated in the component's Log Event path and therefore allows you to query each component for the events and error states it has generated during its execution. This path is the one you will want to use to react to certain generated Log Events of interest as they often also carry e.g. error state information.

The Log Event Processor of each component only sees and therefore can process Log Events that pass its Log Event Filter (with few exceptions, see below). It collects those Log Events, which either were created by its own Log Event Source or have been actively forwarded from its child components (e.g. in case of a pipeline component, this would be its module children).

1.3. Terminate Pipeline Execution Signal

A Log Event Processor can raise the Terminate Pipeline Execution Signal. This is a request to immediately terminate further processing of the currently running pipeline instance. This specifically means that any later components in that pipeline will not be run. Furthermore, that pipeline's Log Event Processor is informed that a child component has raised the Terminate Pipeline Execution Signal (it is passed that condition in a parameter to its custom finalization UPL function). It is then free to reset (= ignore) it or forward it to its parent (if any) by returning it as the result value of the function.

2. Component filter defaults

By default, new modules and pipelines have their Log Event filter set to "inherit", i.e. they use the same setting as their parent. This effectively means that the Log Event Filter settings in pipelines and modules is the same as (and linked to) the Log Event Filter setting in upCast’s Preferences window. If you want more (or less) logging information to be available in a module’s or pipeline’s Log Event Processor, you need to override the default setting.

3. Special situations

3.1. Exception in component initialization

When an error occurs during processing of a component's initialization code (which should never happen in a correctly configured or programmed configuration), it will raise the Terminate Pipeline Execution Signal and all of its collected ERROR or FATAL messages are automatically forwarded to the component's parent.

4. Filter specification syntax

The syntax is an arbitrary sequence of the following, whitespace- or comma-separated tokens, which are executed in the specified sequence:

+FATAL, +ERROR, +WARN, +INFO, +DEBUG, +VERBOSE, +DETAIL, +TRACE, +ALL

enable the respective message type

-FATAL, -ERROR, -WARN, -INFO, -DEBUG, -VERBOSE, -DETAIL, -TRACE, -ALL

disable the respective message type

FATAL, ERROR, WARN, INFO, DEBUG, VERBOSE, DETAIL, TRACE, ALL

enable all messages of the specified level and higher (i.e., WARN is equivalent to +FATAL,+ERROR,+WARN)

+<msgconstantname>

enable that message using its symbolic name

+<msgconstantid1>..<msgconstantid2>

enable all messages between msgconstantid1 and msgconstantid2

-<msgconstantname>

disable that message using its symbolic name

-<msgconstantid1>..<msgconstantid2>

disable all messages between msgconstantid1 and msgconstantid2

inherit

is a special token that uses the filter settings of the parent object of the one it is specified on or, if that does not exist, ALL.

A log message is filtered by applying all tokens of a log filter expression in the sequence they are written, from left to right.

The message symbolic names can be looked up in the documentation of the defining Java class, de.infinityloop.msg.Msg.

Example 17.1. Log filter spec examples

INFO

lets pass all messages that have a level of INFO or higher (i.e. WARN, ERROR, FATAL).

DEBUG -INFO

lets pass all messages that have a level of DEBUG or higher, but that are not of level INFO

+ERROR +INFO

lets pass all messages that have either the level ERROR or the level INFO

ERROR +ColumnNumbersNotContiguous

lets pass all messages that have a level of ERROR or higher, including the warning for a non-contiguous numbering of colspec’s colnum attribute.

INFO --149

lets pass all messages that have a level of INFO or higher, except for the ColumnNumbersNotContiguous warning (whose numerical value is -149)

This is an example for demonstration purposes only on how to exclude a negative number. You should always use symbolic names for error messages (since the number assignment may change between releases), except for custom messages you use in your pipelines for which there are no symbolic constants defined. For those, you must always use positive numbers for custom messages.

DEBUG WARN

is the same as WARN, since the tokens are executed in the order of writing and therefore WARN overrides any settings with regard to levels that DEBUG may have set earlier.

ERROR +1..100 -WARN

lets pass all messages that have a level of ERROR or higher plus all custom log messages with an id between 1 and 100, but excluding from those all whose level is WARN.


Chapter 18. Pipeline Templates

Pipeline Templates in general are no different from regular pipeline definitions. However, to be handled correctly within upCast, you must obey the following important points:

  • the main pipeline configuration file (*.ucdoc) must be named template.ucdoc (exactly like this!) and must reside in the top-level folder of the template

  • all resources that the pipeline template uses (XSLT files, UPL files, other resources) must be stored below its top-level folder

  • the name of the top-level folder will be used in upCast’s UI to refer to the template, so it should be not too long and descriptive

  • To make creating a parameter set from a template work, the template.ucdoc must have defined its UID pipeline property setting, and ideally upCast’s configuration (upCast preferences, Catalogs tab) should include a reference to a catalog file where that UID is mapped to the physical location of the pipeline template

To have upCast find, recognize and display a template in its UI, you need to make sure that its top-level folder is placed in one of the folders upCast is looking for pipeline templates. The places upCast looks for pipeline templates can be specified in upCast’s preferences on the Application Settings tab, Pipeline Template Paths parameter.

Example 18.1. Pipeline Template Example: Pipeline template default file layout

The default pipeline template file layout and file naming is as follows:

These files are located within the descriptively named template folder (use a name of your choice), which itself is placed into one of the locations for Pipeline Template Paths.


Chapter 19. Standard Folders and Locations

upCast makes use of certain locations in the running machine’s file system to store support and session information. These locations are different depending on the underlying native operating system that runs the Java Virtual Machine. This manual denotes these standard places by placing the name of the location in curly braces, e.g. {Application Support Folder}. To learn the actual location on the machine upCast is running on, open the View > System Information… window, where the name as enclosed in the curly braces is listed with its actual corresponding file system location.

The following standard folders and locations are currently defined:

{Application Support Folder}

The root folder for application support data.

{Licenses Folder}

This folder contains installed licenses for the software. License files must end in .uclicense to be recognized.

{JAR License Location Path}

This is the path in the distribution jar where you may store a file named upcast.uclicense. If this file exists at the specified location, it is used. It is then included in the license picker window.

{Log File}

The path to the application’s log file.

{preferences file}

The location of the preferences7.plist file where current application configuration parameters are persistently stored when the application is quit and where these settings are restored from upon next launch.

{Documentation Folder}

The root folder for application documentation data and related files.

{Documentation Root}

The file that is opened when the user chooses View > Built-in Help... . By replacing the default file with a custom one at this location, you can provide custom help files or documentation for specific installations.

{Temporary Items Folder}

The path to the system-specific temporary items folder.

Chapter 20. Unicode translation map

upCast has a built-in mechanism for converting any Unicode character to any other Unicode character, string or even entity notation on export. This is done by means of a Unicode translation map. You can specify a Unicode translation map in various exporter modules as the final stage a character needs to pass through before actually getting written to the output file or stream.

1. Syntax

The syntax is simple: one conversion entry per line, and all lines starting with # or // are treated as comments.

A conversion entry has the form (notation similar to BNF):

conversion ::= unicodeNumber ‘=’ replacement

with:

unicodeNumber ::= hexNumber | decimalNumber
replacement ::= string | hexNumber | decimalNumber 
hexNumber ::= (‘0x’ | ‘0X’ | ‘$’)[0-9A-Fa-f]+
decimalNumber ::= [0-9]+ 
string ::= ‘"’ (asciiChar)* ‘"’ 
asciiChar ::= a one-byte character in the range from 32 to 127, excluding ‘"’

Note that there's no whitespace allowed around the '=' character.

Example 20.1. 

Follows a rather silly example, with the effect added in comments:

// First, we simply convert all spaces to a dot:
32="."
// Then, we convert all capital letter A’s to a 
// full, empty tag: <letter_a />
65="<letter_a />"
// And then, we discard all small 
// letters ‘u’ completely:
0x75=""

2. Options

For your convenience, there are the following options (all indicated by a leading '@' character) you can write on a line:

2.1. @charref

@charref fromCodepoint toCodepoint fillerKey [formatstring]

This specifies how a certain range of code points should be preset. This saves you typing work if you need some range of characters not be specified in UTF-8 encoding, but e.g. as character references.

You can specify this option anywhere in a Unicode translation map, it takes effect (meaning: gets expanded and processed) at that specific location. You may use this to initialize a certain code range and then overwrite selected code points by specifying additional, normal translation rules as described above later on, which will then override the initialization performed by this option.

fromCodepoint

A decimal integer value specifying the start code point of the code range.

toCodepoint

A decimal integer value specifying the end code point of the range.

fillerKey

A string constant identifying the algorithm to use for filling the specified code point range.

dec

The code point range is filled with character references in decimal notation, e.g. &#1234; .

hex

The code point range is filled with character references in decimal notation, e.g. &#x4D2; .

named

The code point range is filled with the named character entity references as defined in http://www.w3.org/TR/xml-entity-names/. Code points in the specified range for which there are no named character entity references defined are left as-is. This allows you to either output them as UTF-8 (do nothing), or in a specific character reference notation by preceding the @charref named option with e.g. the @charref dec option.

pattern

The formatstring parameter string defines a configurable pattern as replacement. The format string is a standard Java MessageFormatter format string, with the following placeholders available:

{0}

the Unicode codepoint of the current character in decimal number notation

{1}

the Unicode codepoint of the current character in hex number notation

{2}

the current character itself

Example 20.2. 

@charref 128 32767 dec

This line fills the Unicode translation map for all code points from 128 to 32767 (incl.) with decimal numerical character references.

@charref 128 256 hex
@charref 128 256 named

These two lines effectively fill the Unicode translation map for all code points from 128 to 256 (incl.) with named character entity references and set those code points for which there are no names defined to hexadecimal character references.

@charref $E000 $F8FF pattern "<illegal-char codepoint="{2}" />"

This will preset the PUA area with an illegal-char element that has as its codepoint attribute set to the hex value of the codepoint it represents/replaces.


2.2. @fill

@fill fromCodepoint toCodepoint value

This specifies how a certain range of code points should be preset. This saves you typing work if you need some range of characters be all set to a single value to be output.

You can specify this option anywhere in a Unicode translation map, it takes effect (meaning: gets expanded and processed) at that specific location. You may use this to initialize a certain code range and then overwrite selected code points selectively by specifying additional, normal translation rules as described above later on, which will then override the initialization performed by this option.

The difference between @fill and @charref is that here, the complete range is set to the same specified replacement value.

fromCodepoint

A decimal integer value specifying the start code point of the code range.

toCodepoint

A decimal integer value specifying the end code point of the range.

value

The value to set each of the code points in the specified range to. This can be any value that is allowed on the right side of a normal conversion entry, so it can be either a single Unicode character specification or a fixed string value.

Example 20.3. 

@fill 0 7 "[ILLEGAL_XML_CHAR]"
@fill 11 12 "[ILLEGAL_XML_CHAR]"
@fill 55296 57343 "[ILLEGAL_XML_CHAR]"
@fill 65534 65535 "[ILLEGAL_XML_CHAR]"

These lines preset the Unicode translation map such that any occurrence of a character that is not allowed in XML 1.0 is output as the text data "[ILLEGAL_XML_CHAR]".


2.3. @invalid-xmlchar

@invalid-xmlchar formatstring

This specifies how invalid XML 1.0 characters should be mapped when they occur in PCDATA content. formatstring is (after expansion) the replacement string for any invalid XML 1.0 characters.

The format string is a standard Java MessageFormatter format string, with the following placeholders available:

{0}

the Unicode codepoint of the offending character in decimal number notation

{1}

the Unicode codepoint of the offending character in hex number notation

{2}

the offending character itself

Additionally, each time a character replacement takes place, a log message of type InvalidXMLCharacter (id = -218, level = WARN) is generated.

Note

This option's definition will also be used as fallback for data in attributes when the more specific @invalid-xmlchar-attr option is not defined.

Example 20.4. 

To use, amend e.g. the XML Exporter's Unicode Translation Map by the following lines:

@invalid-xmlchar "##INVALIDCHAR={0} U+{1}##"

This will generate the following output for offending characters 0x1e and 0x1c in PCDATA content:

...preceding text ##INVALIDCHAR=30 U+1e## following text...

2.4. @invalid-xmlchar-attr

@invalid-xmlchar-attr formatstring

This specifies how invalid XML 1.0 characters should be mapped when they occur in attribute content. formatstring is (after expansion) the replacement string for any invalid XML 1.0 characters in an element's attribute.

The format string is a standard Java MessageFormatter format string, with the following placeholders available:

{0}

the Unicode codepoint of the offending character in decimal number notation

{1}

the Unicode codepoint of the offending character in hex number notation

{2}

the offending character itself

Additionally, each time character replacement takes place, a log message of type InvalidXMLCharacter (id = -218, level = WARN) is generated.

Example 20.5. 

To use, amend e.g. the XML Exporter's Unicode Translation Map by the following lines:

@invalid-xmlchar-attr "##INVALIDCHARATTR={0} U+{1}##"

This will generate the following output for offending characters 0x1e in an element elem's attribute attr:

<elem attr="...preceding text ##INVALIDCHARATTR=30 U+1e## following text...">...</elem>

Chapter 21. CSS property unit table

This table associates arbitrary CSS <length> properties with a pair of unit and precision information. This is useful when the created style information in either the CSS style sheet or the style overrides in the XML output should be human readable, in which case you would provide a table with a unit of measurement that people are most familiar with (inches or centimeters, e.g.), and a reasonable precision like 2 decimal digits.

The default table uses cm as default unit, with a precision of 1 or 2 decimal digits, and pt for special properties like font-size.

1. Syntax

The syntax is simple: one unit association entry per line, and all lines starting with // are treated as comments.

An association entry has the form (notation similar to BNF):

association ::= propertyName ‘:’ ( (unit ‘,’ precision) | ‘#same’ )

with:

propertyName ::= CSS-property-name-identifier
unit ::= ‘m’ | ‘cm’ | ‘mm’ | ‘pt’ | ‘in’ | ‘pc’ | ‘px’ | ‘emu’ | ‘tw’ | ‘hp’
precision ::= [0-9]+

tw is a twip ("twentieth of a point") and the basic length unit used in RTF; 1tw = 0.05pt

emu is a unit used in RTF shape objects; 1cm=360,000emu

hp is a half-point and the unit used in RTF for specifying font sizes; 1hp = 0.5pt

The keyword #same requests that the unit should not be changed.

Required use of #same on selected properties

The use of the #same value is important for properties like line-height which can be either relative, a number or even a keyword, where converting to an absolute length would be impossible. Failing to specify #same on these properties may result in a conversion error.

Example 21.1. 

Here’s an example of a CSS property unit table similar to the one used as the default table in upCast:

@option-default-length-unit:mm 
@option-default-length-precision:2
font-size:pt,1
border-top-width:pt,1
border-right-width:pt,1
border-bottom-width:pt,1 
border-left-width:pt,1
-ilx-border-vertical-inside-width:pt,1
-ilx-border-horizontal-inside-width:pt,1
text-indent:mm,1
width:mm,1
height:mm,1
margin-left:mm,1
margin-right:mm,1
margin-top:mm,1
margin-bottom:mm,1
padding-left:mm,1
padding-right:mm,1
padding-top:mm,1
padding-bottom:mm,1
line-height:#same
border-spacing:pt,2
letter-spacing:pt,2
-ilx-list-marker-offset:tw,0
-ilx-header-offset:mm,1
size:mm,1
-ilx-column-width:mm,1
-ilx-column-gap:mm,1
-ilx-footer-offset:mm,1
size:mm,1

2. Options

There are two special options to specify default behavior

Important

These options must be specified before any unit association for a specific CSS property.

2.1. @option-default-length-unit

@option-default-length-unit

This specifies the default unit to use for all <length> units not specified explicitly in the unit table.

2.2. @option-default-length-precision

@option-default-length-precision

This specifies the default precision to be used for all <length> units not specified explicitly in the unit table.

Chapter 22. Fonts and Encodings

1. Font Configuration

RTF files need to specify which encoding a font to be used is using and what properties it has. This is used by a rendering application to determine the best matching font on a platform where the exact specified font is not available. Additionally, the encoding a font is in is used by the rendering application to correctly interpret the characters found in the RTF file.

However, this mechanism does not support custom fonts with a special mapping of their constituting characters to a Unicode code point. This is what the Font Configuration setup is for. upCast comes with a default font configuration embedded in the application. You may extend and/or override it by providing a custom font configuration override or extension. This can be done either at the application or pipeline level. Here, you can specify standard font properties based on the font name, especially any custom encoding resp. codepage this font uses.

Note

The default Font Configuration can be found at the following location in the package hierarchy in the upcast.jar jar file:

de/infinityloop/resources/config/stdfonts.config

Follows an informal description of the font configuration format and the necessary properties, followed by the descriptioon of the search algorithm employed by upCast to find the properties for a given font.

1.1. Properties and Values

The following special properties are used in the stdfonts.config file:

1.1.1. rtf-font-family

Determines the general RTF font family a font belongs to based on its design. An RTF rendering application will use this information to find a font with similar appearance when an exact match cannot be found.

Supported values: roman, swiss, symbol, modern, script, decor, tech, bidi

1.1.2. codepage

This indicates the Windows codepage the font uses for its encoding.

Supported values: codepageAsInteger, -1, 10000, -1000, -1001, -1002, -1004, -1005

The special values have the following meaning:

-1

Uses the font's encoding, specified in its font table entry in the document being processed. This is the best choice for normal fonts.

-2

Uses the document's default encoding. This should only be used by experts who know what and why they need to do this in very rare situations when processing legacy documents!

10000

Identifies the Mac Roman encoding.

-1000

Identifies the Private Use Area mapper.

-1001

Identifies the standard encoding of the Symbol font.

-1002

Identifies the encoding of the Wingdings font.

-1004

Identifies the encoding of the Zapf Dingbats font.

-1005

Identifies Unicode fonts like "Arial Unicode MS" (i.e. this is essentially an identity mapping)

1.1.3. unicode-offset

Specifies the Unicode codepoint offset for this particular font. On platforms like Macintosh and Windows, fonts that have no Unicode mapping defined like "Webdings" or "Hoefler Text Ornaments", will be mapped 1:1 into the PUA (Unicode Private Use Area). Normally, this is the area of U+F000…U+F0FF, but by using the U-xxxx notation below, you can set the offset anywhere you require.

Supported values: normal | private | U-xxxx

with private being equivalent to U-F000, which is the Unicode codepoint offset (should be in the Private Use Area (PUA)) where this mapping starts, and normal being equivalent to U-0000, which is also the default if the property is not specified.

1.1.4. renderhint-fontswitch

Note

This property is only relevant for the RTF Exporter ("downCast") module.

When the RTF exporter encounters a Unicode character to render to RTF, it first looks whether this character is part of the encoding of the current font. If it is, it is written according to RTF specifications. However, when this Unicode character is not part of the encoding of the current font, the module tries to look up a font in the names specified using the @font-search-list option in order. The first one it finds will be used to write the character to RTF. However, for a subsequent RTF reader to pick this up correctly, the module must write a switch of font for this specific character. This property specifies which method the RTF exporter should use for this, if possible:

font

The RTF exporter will write a simple RTF font switch {\fx c}.

field

The RTF exporter will write the character using a SYMBOL field. This is only possible for single-byte-fonts.

auto

The RTF exporter decides how to best write the character.

1.1.5. renderhint-unicode

Note

This property is only relevant for the RTF Exporter ("downCast") module.

When the RTF exporter needs to write a character, it can do it in two ways: either just the character for the current font’s encoding, or additionally as the original Unicode codepoint. By specifying one of the following values for a font, you can tell the implementation which method it should use (if possible):

always

The RTF Exporter will always write the character in the current encoding and its Unicode equivalent

never

The RTF exporter will not write the Unicode equivalent

auto

The RTF exporter decides how to best write the character.

1.2. Options

Note

This property is only relevant for the RTF Exporter ("downCast") module.

The following general options are available:

1.2.1. @font-search-list

This option lets you specify a comma-separated list of font names in which the RTF exporter will search for an incoming Unicode character to be output to RTF if it is not part of the current encoding. This lets you specify precedences, e.g. you may want to list the actually installed fonts on your particular system first.

If the RTF Exporter does not find a match in the listed fonts, it will use Unicode notation with an underscore ‘_’ as replacement character.

Example 22.1. 

@font-search-list "Arial Unicode MS"

will fall back to the Arial Unicode MS font for characters for characters that are not part of the currently active font's encoding table.


1.2.2. @mode

This option controls the behavior of entries in a user-defined stdfonts.config file.

The default value is override. The default stdfonts.config has mode override specified and is always read first.

replace

Any existing entries when reading this option are cleared, new entries are added in sequence

override

New entries are prepended (as a whole block, in sequence) to any existing ones, effectively overriding already existing font definitions for the same font since they are found first on searching

supplement

New entries are appended to the list of existing ones, i.e. only those for which there isn’t already a definition in the standard table take any effect

Example 22.2. 

Writing

@mode replace

at the top of a font configuration file or code snippet will completely discard (=replace) any existing font mappings with the ones that follow after this option.


1.3. File structure

The file structure is line-based. Each line identifies a set of font names with a set of properties:

mappings ::= fontlist ‘=’ propertyset
fontlist ::= font ( ‘, ‘ font)*
font ::= fontname | ‘"’ fontname ‘"’
propertyset ::= ‘rtf-font-family: ‘ ffval ‘; codepage: ‘ [0-9]+ ‘;’
                ‘ unicode-offset: ‘ uoval ‘; renderhint-fontswitch: ‘ rhfs ‘; renderhint-unicode: ‘ rhuc ‘;’ 
ffval ::= ‘roman’ | ‘swiss’ | ‘symbol’ | ‘modern’ | ‘script’ | ‘decor’ | ‘tech’ | ‘bidi’
uoval ::= ‘normal’ | ‘private’ | ‘U-‘ [0-9A-F]{4}
rhfs ::= ‘font’ | ‘field’ | ‘auto’
rhuc ::= ‘always’ | ‘never’ | ‘auto’
fontname ::= name of font

Note

Note that you must use CSS-style escapes (or numerical character entities of the form &#...;) to generate Unicode characters for specifying font names using characters outside the ASCII range.

All lines starting with // denote a comment line and are ignored, as do empty lines.

1.4. Matching Algorithm

To avoid having to explicitly define every font in the stdfonts.config file which might ever occur in a style sheet, the application implements a multi-stage search algorithm for a matching property definition entry as follows:

First, the default font configuration is read (it has a @mode option value of override).

Then any custom addition/override in the application or pipeline settings and its entries are handled as a whole block according to the specified value for the @mode option. If no mode option is specified, a default of override is assumed. Within this final, concatenated, big font configuration, the following search algorithm is employed:

  1. A search for the exact name (considering case) is performed. The first matching entry is is used if it exists.

  2. A search for the exact name, but ignoring case, is performed. The first matching entry is is used if it exists.

  3. A search for a font name is performed that matches the start of the actual name. So if the characteristics for "Univers Bold" are requested, and there is an entry "Univers" in the font configuration, then its properties are used. Case is ignored.

  4. A search for a font name is performed that is contained in the actual name. So if the characteristics for "L Univers 44" are requested, and there is an entry "Univers" in the font configuration, then its properties are used because the string "Univers" is contained in the actual font name. Case is ignored.

1.5. Default font configuration

Here's the default font configuration as used by upCast as default:

// Default reading mode
@mode override

// The default search order: none! Use Unicode instead.
@font-search-list 

// Default serif font properties:
"Times New Roman", "L Centennial", Times, serif, Palatino, Georgia = rtf-font-family: roman; codepage: -1;

"Times New Roman Greek" = rtf-font-family: roman; codepage: 1253;
"Times New Roman CE" = rtf-font-family: roman; codepage: 1250;
"Times New Roman Cyr" = rtf-font-family: roman; codepage: 1251;
"Times New Roman Tur" = rtf-font-family: roman; codepage: 1254;
"Times New Roman (Hebrew)" = rtf-font-family: roman; codepage: 1255;
"Times New Roman (Arabic)" = rtf-font-family: roman; codepage: 1256;
"Times New Roman Baltic" = rtf-font-family: roman; codepage: 1257;

// Default sans serif font properties:
Arial, System, Univers, Verdana, Helvetica, Tahoma, Optima, Futura, "Trebuchet MS", Lucida, sans-serif = rtf-font-family: swiss; codepage: -1;
"Arial CE" = rtf-font-family: swiss; codepage: 1250;
"Arial Greek" = rtf-font-family: swiss; codepage: 1253;
"Arial Cyr" = rtf-font-family: swiss; codepage: 1251;
"Arial Tur" = rtf-font-family: swiss; codepage: 1254;
"Arial (Hebrew)" = rtf-font-family: swiss; codepage: 1255;
"Arial (Arabic)" = rtf-font-family: swiss; codepage: 1256;
"Arial Baltic" = rtf-font-family: swiss; codepage: 1257;
"Arial Unicode MS" = rtf-font-family: swiss; codepage: -1005;

// Default monospaced font properties:
Courier, ProFont, Monaco, "Courier New", Pica, monospace = rtf-font-family: modern; codepage: -1;

// Some other fonts
"SimSun Western" = rtf-font-family: roman; codepage: 1252;

// --------------------------------------
// Two-byte fonts we know about
// --------------------------------------

// fcharset134 fonts:
SimSun = codepage: 936; renderhint-unicode: auto; renderhint-fontswitch: font;
PMingLiU, SimHei, "MS Mincho" = codepage: 936; renderhint-unicode: auto; renderhint-fontswitch: font;
\00534e\006587\005f69\004e91, \005b8b\004f53, \009ed1\004f53, \00534e\006587\0096b6\004e66, \0065B9\006B63\008212\004F53 = codepage: 936; renderhint-unicode: auto; renderhint-fontswitch: font;

// fcharset128 fonts:
"MS PGothic" = rtf-font-family: roman; codepage: 932; renderhint-unicode: auto; renderhint-fontswitch: font;

// fcharset129 fonts:
"Gulim" = rtf-font-family: roman; codepage: 949; renderhint-unicode: auto; renderhint-fontswitch: font;

// Symbol fonts
Webdings = rtf-font-family: symbol; codepage: -1000; unicode-offset: private; renderhint-fontswitch: field;
ZapfDingbats, "Zapf Dingbats", "ITC Zapf Dingbats" = rtf-font-family: symbol; codepage: -1004; unicode-offset: private; renderhint-fontswitch: field;
Symbol = rtf-font-family: symbol; codepage: -1001; unicode-offset: private; renderhint-fontswitch: field;

Wingdings = rtf-font-family: symbol; codepage: -1002; unicode-offset: private; renderhint-fontswitch: field;

// Proprietary Fonts with no Unicode mapping of *any* contained character; add your own to the list if required:
Tufa, "Hoefler Text Ornaments", Traffic, StarBats, "Score Font 4.0", Marl, Frets, FretBoard, "Apple Symbols", Anastasia, Aloisen = rtf-font-family: symbol; codepage: -1000; unicode-offset: private; renderhint-fontswitch: field;

2. Custom Encodings

2.1. How it works

upCast comes complete with virtually all default encodings you can use in RTF resp. Word, including many two-byte encodings. This means that normally, you do not need to provide a custom encoding.

The default encodings are hard-coded with optimizations done for each specific encoding to provide efficient access, since the mapping functions are called for each character that passes through upCast. These default encodings are therefore not directly user-accessible. However, there are sometimes occasions where you’ll need to use a custom encoding, especially when you are using custom fonts.

upCast provides a custom encoding loader and handler which lets you specify your own mappings from character code point in the font to Unicode by means of a simple text file. Both, one-byte and two-byte encodings can be specified in this way.

To create a custom encoding, you need to create an ASCII text file with the extension .encoding which specifies both the mapping of the individual code points to Unicode and also states which code page it implements. You can give it a name as well for easily spotting it in the UI portions of the application.

By specifying a codepage in a custom encoding that has a default equivalent, you may override any of the factory-supplied encodings.

Since the mapping is built on the fly, specific optimizations cannot be performed and the use of custom encodings may slow-down processing slightly.

2.2. Associating a Font with an Encoding

A custom encoding per se is not tied to anything but the codepage it implements. To tie a codepage to a specific font, you need to extend or override the font configuration. There, you simply list the font’s name and associate it with the codepage a custom encoding implements using the keywords as described.

It is recommended to use codepage values greater than 40000 for custom encodings, as upCast will not use these codepages internally. Which you use for custom encodings is up to you. upCast reserves the range from 32000 to 35000 for internal use, so you should not use these. Also note that when you override a default encoding, every font that is specified to use that encoding will use the custom one.

2.3. File format

File names can be arbitrary, must however have an extension of .encoding.

The file structure is simple: one mapping entry per line, and all lines starting with #, // or ; are treated as comments. To create a two-byte encoding, separate the two bytes by a comma.

A mapping entry has the form (notation similar to BNF):

mapping ::= <srcbyte> [’,’ <srcbyte>] ‘=’ <unicodechar>

with:

srcbyte ::= hexNumber | decimalNumber
unicodechar ::= hexNumber | decimalNumber
hexNumber ::= (‘0x’ | ‘0X’ | ‘$’)[0-9A-Fa-f]+
decimalNumber ::= [0-9]+

Follows a rather silly example, which maps what in codepage 1252 fonts is a space to the at-sign:

@codepage 42001
@encodingname Silly Encoding
$20=$40

Two special options are supported:

@codepage decimalNumber

This specifies the codepage this encoding represents.

You can specify either an existing encoding to override its definition, or create custom codepages for specific fonts, in which case you should choose a codepage number higher than 40000.

@encodingname asciistring

This is a descriptive name for the encoding so you can easily spot it in upCast’s UI.

Chapter 23. Troubleshooting

upCast has some useful small hooks for troubleshooting basic problems in complex installations.

1. Finding out basic version info of an upcast.jar

To get version and build info from an upcast.jar at hand, run the following on the commandline:

java -cp upcast.jar de.infinityloop.upcast.AppVersion

2. Finding out extended environment info

To retrieve extended running environment info present when upCast would launch, run the following on the commandline:

java -jar upcast.jar -version

3. Extended log info

To launch upCast with extended log info turned on even before the respective setting from its preferences file (GUI mode) or when running in a server environment, define the following custom Java property:

-Dde.infinityloop.loglevel=7

You can even redirect the log file location. Please see here for all supported custom properties.

Chapter 24. Copyright, Licenses, Legal, Acknowledgements

1. Copyright, Licenses, Legal

1.1. upCast

upCast and accompanying support material, code and the upCast DTD are Copyright © 1999-2015 by infinity-loop GmbH, Munich, Germany.

1.2. Steadystate CSS2 parser

The application includes a slightly modified version of steadystate’s CSS2 parser. The complete modified source code can be downloaded in accordance with the requirements of its Lesser GPL from our website at: http://www.infinity-loop.de/download/legal/CSS2Parser.tgz.

1.3. Xerces, Xalan

This product includes software developed by the Apache Software Foundation (http://www.apache.org/). The full copyright notice is available here.

1.4. Apache Commons

This application includes Apache Commons, which is covered under the Apache License Version 2.0.

1.5. swing-layout (org.jdesktop.*)

This application includes swing-layout, an extensions to Swing to create professional cross platform layout. It is covered by the LGPL license.

1.6. W3C

The application uses work done by the W3C, which is Copyright © 2002 World Wide Web Consortium, (Massachusetts Institute of Technology, Institut National de Recherche en Informatique et en Automatique, Keio University). All Rights Reserved.
http://www.w3.org/Consortium/Legal/ -- You can view the full Copyright Notice here.

1.7. XML- and OASIS Catalog Support

XML- and OASIS Catalog support code is Copyright © 2001 Sun Microsystems, Inc. All Rights Reserved. -- You can view the full Copyright Notice here.

1.8. Saxon 6.x

This product includes "The SAXON XSLT Processor from Michael Kay", http://saxon.sourceforge.net/, in compliance with its conditions of use and its license found here.

1.9. Saxon-B 9.x

This product includes "The Saxon XSLT and XQuery Processor from Saxonica Limited", http://www.saxonica.com/, in compliance with its license as described here.

1.10. MRJAdapter

This product includes MRJAdapter.jar 1.0.9 in unmodified form by Steve Roy which is distributed using an Artistic License. In complicance with this license, here’s the link to the package’s site for downloading the full distribution: http://homepage.mac.com/sroy/mrjadapter/.

1.11. Jaxen

Copyright 2003-2006 The Werken Company. All Rights Reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

  • Neither the name of the Jaxen Project nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

1.12. Jing

Copyright (c) 2001-2003 Thai Open Source Software Center Ltd

All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

  • Neither the name of the Thai Open Source Software Center Ltd nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

1.13. Redstone XML-RPC Library

This product includes the Redstone XML-RPC Library v. 1.1.1 in compliance with the LGPLv2.

The source code can be obtained from here.

1.14. Apache Ant

This application includes the ant.jar library to allow for being integrated into Ant builds. The library is covered under the Apache License Version 2.0.

1.15. LogBridge

Log-bridge is a subproject of Javatools. The code is covered under the the Apache License Version 2.0.

1.16. JTimepiece

JTimepiece is the advanced library for working with dates and times in Java. The code is covered under the the Apache License Version 2.0.

1.17. XMLUnit

XMLUnit enables JUnit-style assertions to be made about the content and structure of XML. It is covered under the BSD License:

Copyright (c) 2001-2014, Jeff Martin, Tim Bacon

All rights reserved.

  • Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

  • Neither the name of the xmlunit.sourceforge.net nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

1.18. RSyntaxTextArea

RSyntaxTextArea is covered under a modified BSD license:

Copyright (c) 2012, Robert Futrell All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

  • Neither the name of the author nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL <COPYRIGHT HOLDER> BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

1.19. Apache POI

Apache POI, Copyright 2009 The Apache Software Foundation. This software component is covered under the Apache License Version 2.0.

1.20. Simple 4.1.21

Simple, an embeddable Java based HTTP engine, is covered under the Apache License Version 2.0.

1.21. Flying Saucer

The Flying Saucer XML/XHTML renderer library is covered by the GNU Lesser General Public License.

1.22. BrowserLauncher2

This application uses BrowserLauncher2, which is covered by the LGPLv2 license.

2. Acknowledgements and Thanks

Thanks go out to the members of the java-dev Mailing List without whom we would not even be that far as far as the Mac OS X user experience is concerned.

Also we’d like to thank all users and beta-testers for their very helpful feedback and problem reports we receive. This is what helps us making upCast the best conversion tool in its field.