upCast Processing Language (UPL) Specification

Stefan & Christian Roth Hermann

Christian Roth

Revision History
Revision 1Fri, 04 Sep 2015 13:38:00 CEST

1. Introduction
1. Goals of UPL
2. UPL in context
3. An example of UPL usage
2. UPL components
1. UPL Core
2. UPL Tree Processor
3. UPL Core (Language Reference)
1. Introduction
2. Lexical Structure
2.1. Escape Sequences
2.2. Line Terminators
2.3. Tokens
2.4. White Space
2.5. Comments
2.6. Identifiers
2.7. Keywords
2.8. Literals
2.9. Separators
2.10. Operators
3. Types
3.1. Bool
3.2. Color
3.3. Id
3.4. List
3.5. Null
3.6. Numeric
3.7. String
3.8. Void
4. Operator/Type Matrices
4.1. ++ (increment), -- (decrement)
4.2. + (unary plus), - (unary minus)
4.3. ! (not)
4.4. * (multiplication), div (division), mod (modulo)
4.5. = (equals)
4.6. != (is not equal to)
4.7. + (addition/concatenation)
4.8. - (subtraction)
4.9. < (less-than), > (greater-than)
4.10. <= (less-than or equal), >= (greater-than or equal)
4.11. or (logical), and (logical), xor
4.12. := (assignment)
4.13. cast as
4.14. castable as
4.15. instance of
5. Blocks
6. Flow control
6.1. if
6.2. if … else/elseif/else if
6.3. while
6.4. do … while
6.5. for
6.6. for-each
6.7. break
6.8. return
7. Exceptions and exception handling
8. Expressions
9. Variables
9.1. Definition
9.2. Reference
9.3. Assignment
10. Parameters
10.1. Definition
10.2. Reference
11. Functions
11.1. Defining a function
11.2. Calling a function
12. Java Function Bindings
12.1. Defining a function binding
12.2. Calling a bound Java function
13. Directives
13.1. #charset
13.2. #set
13.3. #namespace
13.4. #include
14. Debugging
14.1. Breakpoints
14.2. Watchpoints
4. UPL Tree Processor
1. Building blocks
1.1. UPL Core constructs
1.2. Rules
2. Processing model
2.1. Step 1: Run initialize()
2.2. Step 2: Walk the document tree
2.3. Step 3: Run finalize() / error-finalize()
5. UPL Function reference
1. Type casting functions
1.1. copy
1.2. to-bool
1.3. to-color
1.4. to-id
1.5. to-list
1.6. to-null
1.7. to-numeric
1.8. to-numeric
1.9. to-string
1.10. to-string
2. Functions on Colors
2.1. get-color-component
3. Date & Time functions
3.1. current-dateTime
3.2. format-dateTime
3.3. format-dateTime
4. File functions
4.1. add-to-zip-archive
4.2. add-to-zipfile (deprecated) 
4.3. copy-file
4.4. create-zip-archive
4.5. create-zipfile (deprecated) 
4.6. delete-file
4.7. extract-zip-archive
4.8. extract-zipfile (deprecated) 
4.9. file-exists (deprecated) 
4.10. fs-copy
4.11. fs-create
4.12. fs-create
4.13. fs-delete
4.14. fs-exists
4.15. fs-exists
4.16. fs-info
4.17. fs-info
4.18. fs-move
4.19. get-image-information
4.20. get-path-component
4.21. is-file
4.22. is-filetype
4.23. is-folder
4.24. list-files
4.25. list-files
4.26. list-files-recursively
4.27. list-files-recursively
4.28. move-file
4.29. read
4.30. readln
4.31. relativize-uri
4.32. resolve-uri
4.33. resolve-uri
4.34. sourcefile-uri
4.35. write
4.36. writeln
5. Grouping Functions
5.1. is-painted
5.2. mark-end
5.3. mark-end
5.4. mark-start
5.5. mark-start
5.6. paint-adjacent
5.7. paint-following
5.8. paint-preceding
5.9. set-paint-attr
5.10. set-paint-attr
5.11. set-paint-value
5.12. set-paint-value
5.13. set-painter
5.14. set-painter
6. Graphical UI functions
6.1. open-url-in-browser
6.2. set-progress
6.3. set-ui-text
6.4. show-dialog
7. Functions on Lists
7.1. append
7.2. append-all
7.3. count
7.4. distinct-values
7.5. filter-list
7.6. flatten
7.7. index-of
7.8. is-in
7.9. remove
7.10. set-value-at
7.11. sort
7.12. value-at
8. Logging functions
8.1. clear-log-messages
8.2. create-log-writer (deprecated) 
8.3. create-log-writer
8.4. discard-log-writer
8.5. forward-log-message
8.6. forward-log-messages
8.7. forward-log-messages
8.8. get-log-message-component
8.9. get-log-messages
8.10. get-log-messages
8.11. log
8.12. log-custom
8.13. start-logger
8.14. stop-logger
9. Boolean logic functions
9.1. bitwise-and
9.2. bitwise-not
9.3. bitwise-or
9.4. bitwise-xor
9.5. exists
9.6. exists-var
9.7. false
9.8. is-null
9.9. not
9.10. true
10. Functions on DOM nodes
10.1. append-attr
10.2. append-attr
10.3. attach-value
10.4. attach-value
10.5. comment
10.6. delete
10.7. detach-values
10.8. detach-values
10.9. element
10.10. element
10.11. filter-attrs
10.12. filter-attrs
10.13. get-attr
10.14. get-attr
10.15. get-attr
10.16. get-attr
10.17. get-value
10.18. insert-nodes
10.19. insert-nodes
10.20. mark-split
10.21. mark-split
10.22. move-nodes
10.23. move-nodes
10.24. name
10.25. processing-instruction
10.26. remove-attrs
10.27. remove-attrs
10.28. remove-attrs
10.29. rename-element
10.30. replace-with-children
10.31. replace-with-text
10.32. set-attr
10.33. set-attr
10.34. specifies
10.35. string
10.36. text
11. Numeric functions
11.1. abs
11.2. avg
11.3. max
11.4. max
11.5. min
11.6. min
11.7. sqrt
11.8. sum
12. Other functions
12.1. app-buildnumber
12.2. debug
12.3. delay
12.4. entering
12.5. eval-xpath
12.6. eval-xpath
12.7. generate-uuid
12.8. get-environment-value
12.9. get-outline-level
12.10. get-realm-value-names
12.11. get-rulemode
12.12. get-var
12.13. get-var
12.14. hoist-attr
12.15. hoist-attr
12.16. hoist-single-listpar
12.17. html-to-image
12.18. leaving
12.19. markup-regex
12.20. print
12.21. println
12.22. run-module
12.23. set-grouping
12.24. set-heading-level
12.25. set-process-children
12.26. set-rulemode
12.27. set-var
12.28. single-listpar-level
12.29. stop
12.30. stop
12.31. system-exec
12.32. test-xpath
12.33. throw
12.34. unique-timestamp
12.35. unmangle-string
12.36. wl-convert-doc-to-rtf
12.37. wl-convert-doc-to-rtf
12.38. wl-convert-doc-to-rtf
12.39. wl-convert-rtf-to-doc
12.40. wl-convert-rtf-to-doc
12.41. wl-run-command
12.42. wl-run-command
12.43. xslt
12.44. xslt
13. Functions for working with styles
13.1. %
13.2. markup-style
13.3. markup-style
13.4. markup-style-with-roots
13.5. markup-style-with-roots
14. String Functions
14.1. codepoints-to-string
14.2. contains
14.3. decode-from-uri
14.4. encode-for-uri
14.5. ends-with
14.6. escape-characters (deprecated) 
14.7. escape-characters
14.8. escape-characters
14.9. format-numeric
14.10. get-string-width
14.11. index-of
14.12. index-of
14.13. lower-case
14.14. matches
14.15. matches-list
14.16. normalize-space
14.17. parse-numbering
14.18. process-adjacent-text
14.19. replace
14.20. replace
14.21. starts-with
14.22. string-join
14.23. string-length
14.24. string-to-codepoints
14.25. substring
14.26. substring
14.27. substring-after
14.28. substring-before
14.29. substring-tail
14.30. substring-tail
14.31. tokenize
14.32. trim
14.33. unescape-characters
14.34. unescape-characters
14.35. upper-case

Chapter 1. Introduction

UPL (upCast Processing Language) is a specialized document processing language aimed at making document conversion from graphically marked-up documents into rich logically marked-up documents easy.

UPL gives its users the flexibility to perform a broad range of typical and complex tasks in the realm of document conversion, but hiding the complexity. UPL is very intuitive to use as it borrows established concepts from well-known languages in adjacent realms like XSLT, XPath and CSS, robust and extendable. The development of UPL was not purely academic, it was likewise driven by requirements emerging in document conversion projects all around the world.

1. Goals of UPL

The simple goal of UPL is to optimize the process of converting documents under three major aspects: ease of use, effort minimization and real world usability.

UPL was developed and is targeted to overcome the limitations of current document conversion and processing approaches that are not suited very well to cope with the challenges of converting documents with mostly flat and graphically marked-up structures into logically marked-up documents. As such, it serves very well as a sort of pre-processor to further transformations with XSLT by creating easily acessible structures.

To fulfill the claim ease of use, UPL has been drafted as a simple rules based declarative language component at the top level, with an imperative language core that may be easily learned and used by typical users of document processing systems.

UPL fulfills the claim effort minimization by offering an impressive number of convenience functions solving many recurring, complex tasks in the application domain of document conversion, which include “unsharp” functions helping you to deal with fuzzy input.

And finally, to fulfill the claim for real world usability, UPL follows and brings with it programming mechanisms as known from well known programming languages like Java, C and XPath. This extensibility guarantees the users of UPL the possibility to successfully process all real world documents they need to handle.

2. UPL in context

Most electronically authored documents are created with Microsoft Word – at least at some point during their history. But in contrast to the requirements of modern document pipelines and their underlying systems Word documents are mainly graphically marked-up and their logical structure (even though easily recognized by humans) is not directly accessible to computer systems.

In order to be used in modern systems Word documents have to be enriched with structure. If this is done manually or with the help of manually implemented systems this is a time consuming, expensive and error-prone task.

infinity-loop’s well-known software upCast is widely used to bridge the gap between the different styles of document markup. upCast converts graphically marked-up Word documents into logically marked-up documents by analyzing the logical structure as it is perceived by humans.

As an off-the-shelf product upCast comes with a set of parameters that control details of this "up-casting" process. But without some help from its users (who else knows more about the documents to be handled than them?) upCast cannot mark-up automatically all the valuable information being available in a document. This is the reason why the conversion and structural enrichment process in upCast 7 (which was monolithic in earlier versions) has been split into two phases.

The first phase in upCast is an importer that converts a Word document into an intermediate XML representation that contains both graphical and logical markup. In doing so upCast cleans up the source document and applies a comprehensive set of heuristics in order to extract all the standard XML constructs (like lists, tables, footnotes, …) available in Word documents automatically.

In the second phase UPL comes into play. UPL is used to easily specify additional complex transformations that need to be appied to the intermediate document tree structure that upCast can't do automatically because of its lack of knowledge about the specific type of document.

A UPL Tree Processing specification is declarative and consists of simple rules that apply certain specialized actions or transformations on the intermediate document tree when certain conditions match. In a way, you can think of UPL as XSLT working in-place with a set of highly specialized operators.

3. An example of UPL usage

To give you a first impression of a UPL specification, here's an example of a typical application:

Example 1.1. 

Let‘s say we have determined that all paragraphs containing more than 85% of text that (1) is bold and (2) whose font-size is between 16pt and 18pt are a heading of level 1 – even if they are not marked up with Word paragraph styles. To make all paragraphs that fulfill the above informal description a heading, the following UPL code could be used:

[element(uci:par) 
  and %(@css:font-weight="bold" and @css:font-size >= 16pt and @css:font-size <= 18pt) > 0.85]
{
  set-heading-level(1);
}

Chapter 2. UPL components

UPL in its whole consists of two functionally different components: the UPL Core and the UPL Tree Processor.

1. UPL Core

UPL Core consists of the core functionality like support for typed variables, flow control constructs, function definitions and execution of statemt sequences, block building and expression evaluation. UPL Core is used in upCast in several places.

2. UPL Tree Processor

The UPL Tree Processor defines an algorithm as well as a controlling and selection mechanism to process the nodes of a document tree using UPL Core functionality. It uses a selector/actions metaphor for this, similar to the design of CSS. The UPL Tree Processor is implemented in upCast's UPL Tree Processor module.

Both parts are described in the following two sections in detail.

Chapter 3. UPL Core (Language Reference)

1. Introduction

A UPL specification initially consists of one “initial input” source (e.g. text typed in the GUI or a string set via the API). Unlike to all further (included) input sources, global variables/module parameters are resolved (textually replaced) in the primary input source before it is passed to the UPL processor.

The primary input source and any other input source may contain references to other input sources to be included. Any input source is an ordered sequence of raw characters in a specific encoding (e.g. ISO-8859-1 or UTF-8) that is initially converted into a sequence of Unicode characters. Next, so-called escape sequences occurring in the input are resolved and also translated to Unicode characters. Together these two steps lead to a sequence of input characters that are grouped to tokens (e.g. string literals). Tokens in turn are used to form the building blocks (e.g. a rule) of a UPL specification.

More formally, a UPL specification is processed by a UPL processor in six steps before it can be applied to a document:

Transliteration: Converts the given input characters from a particular character encoding into a sequence of Unicode characters. Subsequently characters specified in one of the possible escape notations are resolved into their Unicode character.

Preprocessing: Included input sources are imported. Note that this happens before lexical analysis.

Lexical Analysis: Translates the sequence of Unicode characters into a sequence of tokens (e.g. keywords, identifiers).

Syntactic Analysis: Parses the sequence of tokens, performs syntax checking.

Semantic Analysis: Statically checks the meaning of a UPL specification to ensure it obeys the rules of the UPL language.

Code Generation: Translates the analyzed specification into executable code.

2. Lexical Structure

This section deals with the lexical structure of UPL specifications.

UPL specifications can be written in many different character encodings (e.g. ASCII, ISO-8859-1 or UTF-8) but are converted by a UPL processor into Unicode before they are really processed. This step is called lexical translation or transliteration.

Actually a UPL specification is nothing more than a sequence of raw characters stored in a specific encoding somewhere on a storage medium. The transliteration step converts sequences of raw characters into input characters by applying two lexical translations.

  • input characters are translated from their given encoding (e.g. ISO-8859-1) into Unicode characters.

  • so-called escape sequences are resolved and the resulting Unicode characters are substituted into the input sequence.

By default UPL supports a wide range of input encodings that cover all encodings supported by the Java programming language implementation it is running on.

2.1. Escape Sequences

UPL knows two kinds of escape sequences being used to allow authors to refer to characters they can't easily put in a UPL specification in specific situations. These are character escapes and unicode escapes.

First, character escapes cancel the meaning of special characters in UPL. Any character (except a hexadecimal digit) can be escaped with a backslash to remove its special meaning.

Example 3.1. 

For example,

"\"" 

is a string consisting of one double quote, the \ escapes the quotation mark.


Secondly, unicode escapes allow authors to refer to characters they can't easily put in a UPL specification because the encoding they use (e.g. ISO-8859-1) does not provide the desired character (e.g. a chinese character). Unicode escapes consist of a backslash \ followed by a hexadecimal number (consisting of at most six hexadecimal digits (0-9, a-f, A-F)), which stands for the Unicode character at that code-point. If a hexadecimal digit follows the hexadecimal number, the end of the hexadecimal number needs to be made clear. There are two ways to do that: with a white space character or by providing exactly 6 hexadecimal digits. In fact, these two methods may be combined. Exactly one white space character is ignored after a hexadecimal escape.

Note

This means that "real" white space after the escape sequence must itself either be escaped or doubled.

Example 3.2. 

"\22"

is the same as

"\22 "

which is the same as

"\""

which is the same as

"\000022"

which is the same as

"\000022 "

and all examples identify a string consisting of one double quote character.

U+0022 is the Unicode codepoint for the double quotation mark.


Note

Unicode escapes are always considered to be part of an identifier or a string (i.e., "\7B" (the Unicode escape for the character {) is not punctuation, even though { is, and "\32" (the Unicode escape for the character 2) is allowed at the start of an identifier, even though 2 is not).

Note

The common escapes \n (newline), \r (carriage return), \f (form feed) and \t (tab) are not supported in UPL. If you want to use these characters, you must use Unicode escapes instead:

The often used escapes…

…meaning…

…must be written in UPL as unicode escapes (followed by a single space):

\n

LINE FEED

\a

\r

CARRIAGE RETURN

\d

\f

FORM FEED

\c

\t

CHARACTER TABULATION

\9

2.2. Line Terminators

The transliterated Unicode characters read from input sources are divided into lines by recognizing line terminators. This definition of lines determines the line numbers produced by the UPL processor e.g. in case of issued warning and error messages. It also specifies the termination of the end-of-line form of a comment.

There are three line terminators defined in UPL: the ASCII LF character (also known as line feed or newline, U+000A), the ASCII CR character (also known as carriage return or simply return, U+000D) and the combination of the ASCII CR character immediately followed by the ASCII LF character.

Thus, lines in UPL are terminated by the ASCII characters LF or CR or CR LF.

Note:

The two characters CR immediately followed by LF are counted as one line terminator, not two.

2.3. Tokens

The input characters and line terminators that result from transliteration, escape processing and line recognition are scanned and produce a sequence of input elements. Those input elements that are not white space or comments are so-called tokens. Tokens are the terminal symbols that are used to describe how syntactically correct UPL specifications may be written.

White space and comments can serve to separate tokens that, if adjacent, might be tokenized in another manner.

2.4. White Space

White space is defined as the ASCII characters space (U+0020), horizontal tab (U+0009), form feed (U+000C), as well as the line terminators (U+000A, U+000D).

2.5. Comments

Like Java and other modern languages UPL knows two different types of comments, traditional comments and end-of-line comments. As usual in both cases the content of comments is completely discarded by a UPL processor.

A traditional comment consists of all the text from the start-marker /* up to the end-marker */.

Note

Both makers and the text in-between belong to the comment.

Note

In contrast to most other languages UPL supports nested traditional comments. Thus, a traditional comment may contain child-comments and hence the number of end-markers for traditional comments in a UPL specifications must be balanced with the number of start-markers.

Example 3.3. 

/* stop(); /* stop execution */ */

will be a valid UPL program with the comment spanning from the first /* to the last */ (in contrast to parsing this e.g. in Java, where the comment would end after the first */).

This comes in handy when you want to temporarily and quickly comment out a section of code that already contains multi-line style comments.


The second supported comment type are end-of-line comments. They comprise all the text from the start-marker // up to the end of the line, designated by the next line terminator.

Note

The markers /* and */ have no special meaning in end-of-line comments and the marker // has no special meaning in traditional comments.

Note

As usual, comments are not analyzed within strings.

2.6. Identifiers

A UPL identifier is made from the following characters, but has to start with a letter:

Letters: a-z, A-Z, and other alphabetic characters from other languages

Digits: 0-9

Specials: - (minus) and _ (underscore)

Colon: An identifier may also contain the colon (:) character, which is used to separate a namespace prefix part from the local name part e.g. when using an identifier to designate namespaced element nodes.

Additionally, UPL identifiers may contain character and unicode escape sequences.

No identifier can have the same name as a UPL keyword, a boolean literal or the null literal.

2.7. Keywords

The following character sequences are reserved for use as keywords and cannot be used as identifiers:

and
as
break
block
case
catch
continue
default
div
do
else
elseif
finally
for
for-each
function
goto
if
javaclass
method
mod
or
parameter
return
switch
then
try
variable
while

While true, false, void and null might appear to be keywords, they are technically literals.

Note

UPL is right now still a somewhat moving target. Therefore, please also avoid using names for identifiers that have a specific meaning in one of the well known programming languages like Java, C# or XSLT – perhaps one day UPL will have to use one of those languages‘ keywords.

2.8. Literals

A literal is the source code representation of a value of one of the data types defined in UPL.

2.8.1. Boolean literals

The UPL type Bool has two values, represented by the literals true and false.

2.8.2. String literals

A string literal consists of zero or more characters enclosed in double or single quotes (both quotes, the starting and the ending one, for a string have to be of the same kind). Each character in the string may be represented by an escape sequence.

"This is a \"String\" literal."

2.8.3. Verbatim string literals

A verbatim string literal consists of zero or more characters enclosed in a pair of three consecutive double or single quotes (both the starting and the ending ones for a string have to be of the same kind).

No escape resolution or further parsing takes place within a verbatim string literal.

"""This is a verbatim "String" literal."""
"""A verbatim "String" literal
may contain line breaks directly, and
escapes (e.g. \") are not resolved, but remain as written."""

2.8.4. Numeric literals

In UPL all values dealing with numbers (like integers and floats known from other programming languages) are of the same sort: Numeric. Even lengths with dimensions are of sort Numeric.

A Numeric has the following parts: a whole-number part, a decimal point (represented by a period character), a fractional part and a dimension identifier. Not all parts of a Numeric are mandatory for a numeric literal. A Numeric representing a number may consist of any combination of the whole-number-part, the decimal point and the fractional part as long as at least one digit contributes to the literal. A Numeric literal may end with a dimension identifier, but does not have to.

The following dimension identifiers are defined in UPL: tw (twips), px (pixel), cm, mm, in, pt, pc.

A pt is 1/72 inch.

A px is 1/96 inch.

A pc is 1/6 inch.

A cm is 1/2.54 inch.

A tw is a 1/20 of a pt, which translates to 1/1440 inch.

2.8.5. Color literals

The basic color literal construct is the well known hexadecimal notation: a # immediately followed by either three or six hexadecimal digits. In a given context where a Color is expected (e.g. in an assignment or a comparison) colors may also be indentified by one of the Ids aqua, black, blue, fuchsia, gray, green, lime, maroon, navy, olive, orange, purple, red, silver, teal, white, and yellow (see http://www.w3.org/TR/CSS21/syndata.html#color-units).

2.8.6. List literals

A list literal is denoted by a {}-pair enclosing the elements of the list. The single list elements are separated by a comma ','.

2.8.7. Null literal

null is the literal for the value of type Null.

2.8.8. Void literal

void is the literal for the one single value of type Void.

2.9. Separators

The following characters are separators (punctuators) known in UPL:

(  )  {  }  [  ]  ;  ,  :

2.10. Operators

The following tokens are the operators known in UPL (ordered by precedence as specified from highest at the top to lowest at the bottom):


Operator syntax

Description

Associativity

Precedence

++

increment

right-to-left

12 (highest)

--

decrement

right-to-left

+

unary plus

right-to-left

-

unary minus

right-to-left

!

not

right-to-left

cast as

cast to a certain type

left-to-right

11

castable as

test for being castable to a certain type

left-to-right

10

instance of

checking for a certain type

left-to-right

9

*

multiplication

left-to-right

8

div

division

left-to-right

mod

modulo

left-to-right

+

addition; concatenation

left-to-right

7

-

subtraction

left-to-right

<

less than

left-to-right

6

<=

less than or equal to

left-to-right

>

greater than

left-to-right

>=

greater than or equal to

left-to-right

=

equals

left-to-right

5

!=

is not equal to

left-to-right

and

conditional and

left-to-right

4

xor

logical exclusive or

left-to-right

3

or

conditional or

left-to-right

2

:=

assignment

right-to-left

1 (lowest)

Operators and and or

The and and or operators perform conditional AND and conditional OR operations on two boolean expressions. These operators exhibit "short-circuiting" behavior, which means that the second operand is evaluated only if needed.

Operator --

Note that the operator '--' must be separated by whitespace from the variable name. This is a requirement from the fact that variable names may contain the '-' character as part of their name.

So, instead of

$i--; /* wrong! */

you need to write

$i --; /* correct */

Operator mod

The operator is only implemented for Numerics. However, these may be of different power.

The following holds:

  1. $a mod 0 = $a

  2. the method implements the remainder-style mod operator as XSLT does

  3. the sign of the result is the same as that of the dividend

3. Types

UPL is not a strongly typed language, which means that not all types for UPL language components can be calculated at compile time. Types limit the values that a variable or a parameter can hold or that an expression can produce, limit the operations suported on those values and determine the meaning of the operations.

There are seven data types defined in UPL: Bool, Color, Id, List, Null, Numeric and String. All types are derived from the type Value that is the common superclass for all other classes.

3.1. Bool

The Bool type represents a logical quantity with two possible values, indicated by the literals true and false. Besides the literals the additional constructors true() and false() are available.

3.2. Color

The data type Color is used to represent colors in UPL. Values for colors may either be written in the well known hexadecimal notation: a # immediately followed by either three or six hexadecimal characters. Colors may also be indentified by one of the Ids aqua, black, blue, fuchsia, gray, green, lime, maroon, navy, olive, orange, purple, red, silver, teal, white, and yellow (see http://www.w3.org/TR/CSS21/syndata.html#color-units).

3.3. Id

The data type Id represents either named values as used as parameters for functions (which you'll find listed with that function's documentation), or element or attribute names. An Id therefore allows using a namespace resp. the declared namespace prefix with it. The syntax for an Id is:

(nsprefix ':')? name

Both, the nsprefix and name components of an Id must be UPL identifiers with the exception that nsprefix must not contain a colon (':').

3.4. List

The data type List is used to represent an ordered sequence of values of any type (including List itself). The syntax for a constant List is:

'{' value? (',' value)* '}'

The empty list is therefore created by writing {}.

value can be either a constant value represented by a literal, a variable reference or any other expression yielding a value result. Values are inserted with their respective type into the list. A list can be heterogenous, i.e. it can contain elements of different types at the same time. A UPL List is therefore similar to Java's ArrayList class (on which it is also based internally).

The special conversion function to-list() may be used to convert values of other classes to a List.

Example 3.4. 

{}

creates an empty list

{ 1, 2, 5.5 }

creates a list with the three Numeric values 1, 2 and 3.5

{ "Error: ", $code }

creates a list of the String value "Error: " and the contents of the variable $code at the time of List construction

{ { 1, 1 }, { 2, 4 }, {3, 9 } }

creates a List of two-element lists which contain two Numeric values each

{ { 1, square(1) }, { 2, square(2) }, { 3, square(3) } }

creates the same list as in the previous example, assuming the square() funtcion is defined to calculate the squared value of its argument


3.5. Null

The data type Null is used to represent exceptional values or non-existing values.

Example 3.5. 

@color

either returns the contents of the XML attribute color on the context node as a String, or a value of type Null when the context node does not have such an attribute.

Note that just returning an empty string is not a valid alternative solution in the latter case as it would not allow you to distinguish the situation where the attribute is present, but has an (allowed) value of the empty string, from the situation where the attribute is not present at all.


3.6. Numeric

The data type Numeric is used to represent dimensionless or dimensioned numbers, e.g. 42, 3.1415, 2.54cm, -360tw.

They are either the result of function calls or can be created using Numeric literals.

Important

Dimensioned numbers (e.g. lengths) are internally always normalised to twips (tw; a twentieth of a point). Calculations are always performed on normalized values.

3.7. String

The data type String is used to represent strings in UPL. As usual, string literals are characters enclosed in double or single quotes.

The following operators are defined for the data type String: = (equals), != (not-equals), < (less than), > (greater than), <= (less than or equal to), >= (greater than or equal to), + (addition, string concatenation).

Note

The + operator in UPL is overloaded as it is in Java. As a convenience you may simply concatenate strings by using the + operator. Additionally, if the left-hand operand for + is a string, any other data type on the right hand side of the + operator will automatically be cast to String and a string concatenation will be carried out.

Example 3.6. 

If you write the following in UPL:

“number:” + 10 + 1 

you will get

“number:101”

, a string concatenation of “number:”, “10” and “1”.

If you want to carry out a mathematical addition on 10 and 1 before adding it to the string you have to put the expression 10 + 1 into brackets and write

“number:” + (10+1)

3.8. Void

The data type Void is used to declare functions that do not return a value.

Example 3.7. 

function hello-world() as Void {
  print( "Hello, world!" );
}

hello-world() is defined to not return a value and therefore is not required to include a return statement.


4. Operator/Type Matrices

This section lists the possible combinations of operand types and operators. Tables are to be read with left column=first operand type, top row=second operand type in the form

A ○ B

: operator as given in left upper table corner

Left column: A

Top row: B

✕: means that the operation is not defined and will throw an EvalException

✕: means that the operation is not defined and will throw an EvalException

4.1. ++ (increment), -- (decrement)

++, --

Bool

Color

Id

List

Null

Numeric

++: A+1

--: A-1

String

4.2. + (unary plus), - (unary minus)

+, -

Bool

Color

Id

List

Null

Numeric

+: A

-: (-A)

String

4.3. ! (not)

!

Bool

!A

Color

!to-bool(A)

Id

!to-bool(A)

List

!to-bool(A)

Null

!to-bool(A)

Numeric

!to-bool(A)

String

!to-bool(A)

4.4. * (multiplication), div (division), mod (modulo)

*, div, mod

Bool

Color

Id

List

Null

Numeric

String

Bool

Color

Id

List

Null

Numeric

AB

String

4.5. = (equals)

=

Bool

Color

Id

List

Null

Numeric

String

Bool

A=B

false

Color

Ar=Br & Ag=Bg & Ab=Bb & Aa=Ba

A=to-color(B)

false

Id

to-color(A)=B

A=B

false

List

∀i(Αi=Βi)

false

Null

false

false

false

false

true

false

false

Numeric

false

A=B

String

false

A=B

Note

Please note that any value of type Null is equal to any other value of type Null, even if the two happen not to be the same object (object identity) internally. For example,

to-null("Hello") = to-null("World!")

evaluates to true, even though the two value objects returned by to-null() are not identical.

This is particularly noteworthy when building selectors like

[ @class = @rawclass ] { ...do somtehing... }

which evaluates to true in the case where none of the two attributes referred to exist on the element. In that case, the selector effectively becomes [ null = null ] (as @attname is defined to return null when the attribute attname does not exist), which is true!

4.6. != (is not equal to)

!=

Bool

Color

Id

List

Null

Numeric

String

Bool

A!=B

true

Color

Ar!=Br | Ag!=Bg | Ab!=Bb | Aa!=Ba

A!=to-color(B)

true

Id

to-color(A)!=B

A!=B

true

List

∃i(Αi!=Βi)

true

Null

true

true

true

true

false

true

true

Numeric

true

A!=B

String

true

A!=B

!= is !(=)

You can think of the values in the above table to be derived by calculating !(A=B), i.e. the negated values of the operator results as described in this table.

4.7. + (addition/concatenation)

+

Bool

Color

Id

List

Null

Numeric

String

Bool

Color

Id

List

A⊕B

A⊕B

A⊕B

A⊕B

(element by element)

A⊕B

A⊕B

A⊕B

Null

Numeric

A+B

String

A⊕to-string(B)

A⊕to-string(B)

A⊕to-string(B)

A⊕to-string(B)

A⊕"null"

A⊕to-string(B)

A⊕B

+ : addition    ⊕: concatenation

4.8. - (subtraction)

-

Bool

Color

Id

List

Null

Numeric

String

Bool

Color

Id

List

Null

Numeric

A-B

String

4.9. < (less-than), > (greater-than)

<, >

Bool

Color

Id

List

Null

Numeric

String

Bool

A○B

Color

Id

List

Null

Numeric

AB

String

comp(A,B)○0

4.10. <= (less-than or equal), >= (greater-than or equal)

The operators <= and >= are calculated as follows, according to the matrices above:

A<=B ::= (A=B) or (A<B)

where an exception during evaluation of (A=B) is treated as false.

A>=B ::= (A=B) or (A>B)

where an exception during evaluation of (A=B) is treated as false.

The evaluation is succeed-fast, i.e. if (A=B) is true, the second operand of the or-expression is not evaluated.

4.11. or (logical), and (logical), xor

or, and, xor

Value

Value

to-bool(A)○to-bool(B)

The order of evaluation for the operators or and and is from left to right, and it is guaranteed that only that many operands are evaluated as necessary to determine the final result.

For xor, both operands are always evaluated, and the order of evaluation is not defined.

4.12. := (assignment)

:=

Bool

Color

Id

List

Null

Numeric

String

Bool

A:=B

A:=B

Color

A:=B

A:=to-color(B)

A:=B

Id

A:=B

A:=B

List

A:=B

A:=B

Null

A:=B

Numeric

A:=B

A:=B

String

A:=B

A:=B

4.13. cast as

cast as

Bool

Color

Id

List

Null

Numeric

String

Bool

A

true → true

false → false

null

true → 1

false → 0

true → "true"

false → "false"

Color

A

null

Id

true | 1 → true

false | 0 → false

other → ✕

lexcial representation of a CSS 2.0 color → to-color(to-string(A))

other → ✕

A

null

to-string(A)

List

A

null

to-string(A)

Null

A

Numeric

0 → false

othertrue

null

A

to-string(A)

String

"true" | "1"true

"false" | "0"false

other → ✕

lexical representation of Colorto-color(A)

other → ✕

to-id(A)

null

lexical representation of Numeric to-numeric(A)

other

A

4.14. castable as

castable as

Bool

Color

Id

List

Null

Numeric

String

Bool

false

when entry in matrix A cast as B is ✕

true

otherwise

Color

Id

List

Null

Numeric

String

4.15. instance of

instance of

Bool

Color

Id

List

Null

Numeric

String

Value

Bool

true

false

false

false

false

false

false

true

Color

false

true

false

false

false

false

false

true

Id

false

false

true

false

false

false

false

true

List

false

false

false

true

false

false

false

true

Null

false

false

false

false

true

false

false

true

Numeric

false

false

false

false

false

true

false

true

String

false

false

false

false

false

false

true

true

5. Blocks

A block is used to group a sequence of statements into a group and to define variables that are only visible within this block.

A block must begin with the keyword block followed by a left brace { and it must end with a right brace }.

Example 3.8. 

block {
  variable $greeting as String := "hello";
  println( $greeting );
}

Blocks serving for specifying scopes can also be used when defining custom loggers (see start-logger(), stop-logger()).

6. Flow control

The statements inside your UPL source files are generally executed from top to bottom, in the order that they appear. Flow control statements, however, break up the flow of execution by employing decision making, looping, and branching, enabling your program to conditionally execute particular blocks of code. This section describes the decision-making statements (if, if … else), the looping statements (for, for-each, while, do … while), and the branching statements (break, return).

6.1. if

The if statement tells your program to execute a certain section of code only if a particular test evaluates to true.

Example 3.9. 

if($number = 1) 
{
  print("unicycle");
}

Note

Please note that in contrast to other programming languages like Java or C, the opening and closing braces are required.

6.2. if … else/elseif/else if

The if … else statement provides a secondary path of execution when the test evaluates to false.

Example 3.10. 

if($number = 1) 
{
  print("unicycle");
} else {
  print("bike");
}

There is also a variant of the if … else statement that knows more than two sections of code using the elseif keyword or the two separate tokens else if:

Example 3.11. 

if($number = 1) {
  print("unicycle"); 
} elseif($number = 2) { // keyword 'elseif'
  print("bike"); 
} else if($number = 3) { // combination of keywords 'else' and  'if'
  print("trike"); 
} else {
  print("quad");
}

Note

Please note that in contrast to other programming languages like Java or C, the opening and closing braces are required.

6.3. while

The while statement continually executes a block of statements while a particular condition is true.

Example 3.12. 

while( $number > 0 ) 
{ 
  $number := $number - 1; 
}

Tip

You can implement an infinite loop using the while statement as follows:

while( true() ) {
  /* place your code here */ 
}

Note

Please note that in contrast to other programming languages like Java or C, the opening and closing braces are required.

6.4. do … while

UPL also includes a do … while statement.

Example 3.13. 

do { 
  $number := $number - 1; 
} while( $number > 0 );

Note

Please note that in contrast to other programming languages like Java or C, the opening and closing braces are required.

The difference between do … while and while is that do … while evaluates its expression at the bottom of the loop instead of the top. Therefore, the statements within the do block are always executed at least once.

6.5. for

The for statement provides a compact way to iterate over a range of values. Programmers often refer to it as the "for loop" because of the way in which it repeatedly loops until a particular condition is satisfied. The general form of the for statement can be expressed as follows:

for (initialization; termination; post-iteration) {
  /* your code goes here */
}

When using this version of the for statement, keep in mind that:

  • The initialization expression initializes the loop; it's executed once, as the loop begins.

  • When the termination expression evaluates to false, the loop terminates.

  • The increment expression is invoked after each iteration through the loop; it is perfectly acceptable for this expression to increment or decrement a value, or perform any other statement.

Example 3.14. 

variable $number as Numeric;
for( $number := 0 ; $number < 5 ; $number++ ) 
{ 
  /* place your code here */ 
}

Note that the variable $number must be defined outside the for statement, you cannot do this in the initialization expression.


The three expressions of the for loop are optional. An infinite loop can be created as follows:

for ( ; ; ) {
  /* your code goes here */ 
}

6.6. for-each

The for-each statement provides a compact way to iterate over all the items contained in a list.

Example 3.15. 

variable $number as Numeric;
for-each( $number in {1,4,5} )
{
  print($number); 
}

The variable $number must already be defined outside the for-each statement. The used variable will subsequently be set to all the values contained in the list in element order.


6.7. break

A break statement terminates the innermost while, do … while, for or for-each statement.

Example 3.16. 

for-each( $number in {1,4,5} ) 
{
  if($number = 4) 
  { break; }
}

6.8. return

The return statement exits from the current function or method and control flow returns to where the function/method was invoked. The return statement has two forms: one that returns a value, and one that doesn't. To return a value, simply put the value (or an expression that calculates the value) after the return keyword.

Example 3.17. 

return $number;

The data type of the returned value must match the type of the function's declared return value. For methods, use the form of return that doesn't return a value:

return;

7. Exceptions and exception handling

UPL uses exceptions to handle errors and other exceptional events. When an error occurs during the execution of a statement, UPL throws an exception. This means that the normal flow of the program is interrupted and that the UPL processor attempts to find the innermost excption handling block that declares to be able to handle the type of exception (error) that occurred. The exception handler can attempt to recover from the error or, if it determines that the error is unrecoverable, provide a gentle exit from the program.

Three statements play a part in handling exceptions:

  • The try keyword identifies a block of statements within which an exception might be thrown.

  • The catch keyword must be associated with a try-block and identifies a block of statements that can handle a particular type of exception. The statements within this block are executed if an exception of a particular type occurs within the try-block.

  • The finally keyword must be associated with a try-block and identifies a block of statements that are executed regardless of whether or not an exception occurred within the try-block.

The general form of the try{} … catch(){} statement can be expressed as follows:

try {
  /* statements where an exception might be thrown */
} catch ($e as ExceptionType) {
  /* statements that handle an exception
     $e is opaque, do not use it directly! Use to-string($e) to get a textual
     representation of the exception. */
} finally {
  /* statements executed regardless of whether or not an error occured */
}

UPL offers the possibility to specify multiple catch statements for a try statement. Each catch statement specifies the exception type it handles. By using the variable bound in the catch statement details about the exception may be queried.

The following exception types are currently defined and used:

Exception

This type is the super-type of all the following, more specific types. You should therefore put the catch statement for Exception last after all more specific catch statements.

IOException

This exception signals an error during file IO operations. Functions that might throw this exception: write(), writeln()

EvalException

This exception indicates a dynamic error during UPL program execution. Details can be retrieved from the exception's message component, available through the to-string() function evaluated on the exception variable.

UserDefinedException

This exception is currently never thrown by the UPL implementation. You should use this type with the throw() function.

TypeConversionException

This exception is thrown when a value of a specific type cannot be cast to a different type. This exception can occur either in explicit casting functions (like to-color(), to-numeric(), …), but can also occur on implicit casting operations.

8. Expressions

An expression consists of one or more operands and zero or more operators linked together to compute a value. Operands and operators can be grouped for evaluation using parantheses ( and ). Essentially, for predence, association and order of evaluation are the same as for Java (and therefore most programming languages).

9. Variables

UPL allows you to define variables like in Java or C. Variables defined outside of any block are called global variables and can be accessed from anywhere within the program. Variables defined within a block are local to it, i.e. their scope is the block they were defined in.

Variable names are constructed as follows:

'$' identifier

To clearly differentiate variables from other, mostly constant, identifier usage, variable names always start with the dollar sign.

UPL supports three types of variables: local variables, external variables and realm variables.

9.1. Definition

9.1.1. Local variables

A local variable is defined as follows:

'variable' varname 'as' type ( ':=' init-expression )? ';'

For varname, you can use any valid Id, e.g. $my:counter or $i.

For type, any UPL type (except Void) is allowed, including the generic Value type which allows storing any of the specific types in the variable.

For the optional initialization part, init-expression can be any expression that evaluates to a value whose type is the same as the declared type for the variable.

Example 3.18. 

To declare a variable $s to hold a string and initialize it with the empty string, you'd write:

variable $s as String := "";

9.1.2. External variables

External variables allow you to store native UPL variable values at a certain context level, or "store", within a pipeline's static and dynamic execution context. A store is identified by a pre-defined name which – for static stores – corresponds to the already known upCast realms module, pipeline, and application.

There is one additional, dynamic store defined: run, which lets you store variables only for the duration of the run of a certain (top-level) pipeline.

External variables are therefore much like realm variables with the only difference that they are fully integrated into UPL's type system and keep their native UPL type (which is not the case for realm variables which are converted back and forth between native Java and UPL data types on a best effort basis).

An external variable is defined as follows:

'variable' varname 'as' type 'in' store ( 'default' default-expression )? ';'

For varname, you can use any valid variable name (which must be of type Id), e.g. $my:counter or $i.

For store, the following values are defined:

module

stores the variable in the context of the currently running module. This allows you to access it from any UPL code executed by that module, e.g. in its custom initialize or finalize methods or – if it's an UPL processor – from its main UPL program.

pipeline

stores the variable in the context of the currently running pipeline, i.e. the pipeline where the variable is defined or the nearest pipeline ancestor viewed from the point where the variable definition takes place (e.g. in a pipeline's module).

application

stores the variable in the context of the running upCast application. Note that the variable will only ever be disposed of when the upCast application quits. It can be accessed from any running pipeline at any point in time after its definition.

run

stores the variable in the context of the top-level (i.e., "main") pipeline instance that is currently running. This means that the variable is available to all UPL code executing at any sub-pipeline/module nesting level of the respective conversion run.

For type, any UPL type (except Void) is allowed, including the generic Value type which allows storing any of the specific types in the variable.

For the optional default part, default-expression can be any expression that evaluates to a value whose type is the same as the declared type for the variable. The default-expression is only relevant (and used) in the case where that variable was not already created and assigned a value at some earlier point in the execution flow.

Important

Once defined, a variable's type sticks with it through its full lifetime. It is an error to re-define an already existing external variable using a different type. The type of an external variable of a certain name in a specific store is therefore defined by the first time it gets set during pipeline execution.

Example 3.19. 

To declare a variable $s to hold a string while running this pipeline, and initialize it with the empty string in case it does not already exist, you'd write:

variable $s as String in pipeline default "";

9.1.3. Realm variables

The significant difference from the other types of UPL variables is that realm variables aren't declared at all. What makes them to a realm variable is the specific namespace ("realm") their name (Id) is in.

UPL's realm variables mirror exactly the upCast variable system with its defined realms module, pipeline, application, environment and javaproperty.

In UPL, each of these realms are bound to a fixed namespace name, and any variable being in that defined namespace makes it act like an upCast variable.

Depending on the respective realm, it will be read-only (e.g. environment), or both readable and writable.

The namespace names and their binding to the various realms are described in detail here in the upCast manual. The respective recommended namespace declarations are repeated here for your convenience:

#namespace application "http://www.infinity-loop.de/namespace/upcast-realm/application";
#namespace environment "http://www.infinity-loop.de/namespace/upcast-realm/environment";
#namespace pipeline "http://www.infinity-loop.de/namespace/upcast-realm/pipeline";
#namespace module "http://www.infinity-loop.de/namespace/upcast-realm/module";
#namespace javaproperty "http://www.infinity-loop.de/namespace/upcast-realm/javaproperty";

Example 3.20. 

In the following code fragment,

if( fs-exists( $pipeline:SourceFile ) ) { 
  ... 
}

the variable reference $pipeline:SourceFile returns the same string value as if accessing the very same variable in the upCast variable system by writing

${pipeline:SourceFile}

Values in the upCast variable system are stored as native Java objects. When writing or reading those values, UPL needs to translate them between its native types and corresponding Java types on a best-effort basis. The employed mapping between Java types and UPL types is shown in the following table:

UPL type

Java type

(class or interface)

Bool

java.lang.Boolean

Numeric

java.lang.Integer

Numeric

java.lang.Double

String

java.lang.String

Id

java.lang.String

List

java.lang.List

Null

null

java.lang.Object

9.2. Reference

To refer to the contents of a variable, you simply write the variable's name, e.g.:

print( $s );

9.3. Assignment

To assign a new value to a variable, you use

varname ':=' expression ';'

Example 3.21. 

$s := "Hello world!";

would assign the string value "Hello world!" to the variable $s.


10. Parameters

Sometimes, you'd like to pass initial values from the outside, so-called parameters, to an UPL program. In UPL, this works much like <xsl:param> in XSLT. Parameters are very similar to variables with the only difference being that they can take their initial values by way of an implementation-defined mechanism from outside the UPL execution environment.

Once defined, it is not allowed to re-assign a new value to a parameter.

10.1. Definition

A parameter is defined as follows:

'parameter' parname 'as' type ( 'default' default-expression )? ';'

For parname, you can use any valid variable name, e.g. $mode or $p.

For type, any UPL type (except Void) is allowed, including the generic Value type which allows storing any of the specific types in the variable. It is up to the implementation how parameter values of a specific type are created.

For the optional default part, default-expression can be any expression that evaluates to a value whose type is the same as the declared type for the parameter. Note that the default-expression is only relevant (and used) in the case where the surrounding environment does not supply a value for that specific parameter at all. In all other cases, the supplied value is used instead of the default-expression's value.

Important

Parameters can only be defined at the top level of an UPL program, outside of any block, and before any rule, function or method definition.

10.2. Reference

To refer to the contents of a parameter, you simply write the parameter's name, e.g.:

print( $mode );

11. Functions

UPL allows you to define custom functions. Functions do not need to be declared before being used. Their definition, however, must always be part of the executed UPL program, though it does not matter whether they have been #included from another file or are part of the main UPL program unit.

11.1. Defining a function

A function definition takes the following form:

'function' funcname '(' ( param 'as' type )? (',' param 'as' type )* ')' 'as' restype '{' body '}'

with

funcname

the name of the method, which must be a UPL identifier

param

the name of a formal parameter, which must be a UPL variable name

type

the required type for the formal parameter param, which must be one of the UPL types Bool, Id, List, Numeric, String. Note that the generic type Value is not an allowed type for a function parameter.

restype

the type of the result of the function, which must be one of the UPL types Bool, Id, List, Numeric, String, Value (generic type for returning any of the five specific types) and the special type Void (signalling that the function does not return a value at all).

body

a sequence of statements and/or variable definitions; each code path must end with a return statement returning a value of the specified restype, unless the function is defined wit a return type of Void, in which case you must use either return without any value or not use return at all.

Example 3.22. Function definition

function hello( $name as String ) as String {
  variable $result as String := "Hello ";
  if( $name = "" ) {
    $result := $result + "you!";
  } else {
    $result := $result + $name + "!";
  }
  return $result;
}

defines a function returning a value.

function greeting( $name as String ) as Void {
  print( "Hello " + $name );
}

defines a function without a return value.


11.2. Calling a function

To call a built-in or user-defined function, use:

name '(' param? (',' param)* ')'

Example 3.23. Calling a function

greeting( "Christian" );        /* prints "Hello Christian" on the console */
log( INFO, hello( "Steven" ) ); /* writes "Hello Steven!" to the logfile */
print( hello( "" ) );           /* prints "Hello you!" on the console */
$res := hello( 5 );             /* throws a TypeConversionException: parameter type does not match */

Function definitions are looked up based on the actual types of parameters used in the call at the time of the call. As a consequence, when the actual types do not exactly match one of the available function definitions, UPL throws a TypeConversionException (which can be caught in a try … catch construct).

You can overload functions of the same name by specifying different signatures.

You cannot overload a function on its return type only.

12. Java Function Bindings

UPL allows you to define function bindings to static Java functions. The defining Java classes need to be on the Java classpath at function execution time.

The mechanism used is similar to the one XSLT processors use for extension functions: The function identifier is a namespaced Id, with the namespace name designating the implementing Java class, and the local name being the name of the static function to call in that class.

12.1. Defining a function binding

A Java function binding takes the following form:

'javafunction' prefix:funcname '(' ( param 'as' type )? (',' param 'as' type )* ')' 'as' restype ';'

with

prefix

the namespace prefix for the namespace name identifying the Java class to declare a static member function of. The namespace name must have the form java:Fully.Qualified.Classname

funcname

the name of the Java method in the class designated by the namespace name bound to prefix

param

the name of a formal parameter, which must be a UPL variable name

type

the required type for the formal parameter param, which must be one of the UPL types Bool, Id, List, Numeric, String, Value.

restype

the type of the result of the function, which must be one of the UPL types Bool, Id, List, Numeric, String, Value (generic type for returning any of the five specific types) and the special type Void (signalling that the function does not return a value at all).

12.2. Calling a bound Java function

To call a bound Java function, use:

prefix:funcname '(' param? (',' param)* ')'

Example 3.24. Java function binding example

Suppose we wanted to write a custom Java class that reverses the order of characters of a string. The Java code could look like the following:

package com.example;
import de.infinityloop.upcast.upl.val.UPLString;

public class StringReverser {
  public static UPLString reverse( UPLString source ) {
    return new UPLString( new StringBuffer( source.getAsString() ).reverse().toString() );
  }
}

To create the function binding to that Java function in UPL, use the following code:

#namespace sr "java:com.example.StringReverser";
javafunction sr:reverse( $source as String ) as String;

And to finally call that function somewhere, you could use code like the following:

...
println( sr:reverse("never odd or even") ); // a palindrome :-)
...

which should print the following to the system console:

neve ro ddo reven

13. Directives

UPL supports a small number of directives that either allow setting general execution options for the program at hand or operations like including external source files or defining a namespace binding to a prefix.

Directives always start with a hash mark character.

13.1. #charset

This directive allows you to declare the encoding (or character set) used for the following UPL program (including the #charset directive). The use of the #charset directive has the same requirements as the @charset rule defined in CSS2.1. We therefore just include the relevant parts of section 4.4 of the Cascading Style Sheets Level 2 Revision 1 (CSS 2.1) Specification by reference, the only difference being that in UPL, the directive is #charset (with a hash at the beginning) instead of @charset.

Important

The #charset directive is only allowed in external UPL code files. It is an error to have pipeline document-embedded UPL code start with a #charset directive.

All upCast editor components are Unicode savvy and pipelines are always saved as UTF-8, so there is no need to specify the encoding for embedded UPL code since it is fixed.

13.2. #set

The generic #set directive allows setting options for processing the UPL program at hand in a certain way. The syntax for the #set directive is as follows:

'#set' option ':=' value ';'

with:

option

the name of an option as listed below

value

the value constant to set the option to

The following options are available (default value when not specified printed in bold):

option (as Id)

value range

description

singleStep

true | false

When true, UPL program execution is stopped after each statement, requiring user interaction for execution continuation. This is intended for debugging uses.

defaultRuleMode

"break" | "continue" | "exit" | "jump:labelname"

Sets the default rule mode to use for tree traversal using this file. This can be overridden in rules using set-rulemode() and the current value can be queried using get-rulemode().

traceRuleApplication

true | false

When true, the execution of all selected rules is logged to the logging system. This is intended for debugging uses only since it can generate an enormous amount of data.

leaveEvents

true | false

When true, the rule selection and execution algorithm is performed twice per node, once at entering time (i.e. before processing the node's children), and once at leaving time (after having processed a node's children). You can check for the current execution mode using the entering() and leaving() functions.

IDAttributes

'"' (((elemqname|'*') '/' )? '@' attqname ' ')* '"'

A string of whitespace-separated patterns identifying attributes that need to be treated as having the type ID. This ensures that in element splitting operations (as can happen in e.g. markup-regex() or mark-split()), the cloned element does not get a copy of this attribute which would violate its ID status.

Example 3.25. 

#set IDAttributes := "figure/@figid @elemid";

would identify both the figid attribute on figure elements and the elemid attribute on all elements as attributes of type ID.


The default value is "".

Whenever the option is set, it replaces any previous setting.

Any attribute that has a specified type of ID (e.g. due to information gathered by the parser from a DTD) will be treated as such, regardless whether it is listed or not.

groupQName

elemqname

The qualified name specified is used for all grouping elements created by the module it is defined in. This includes groupings by all variants of markup-regex(), markup-style(), and by the Grouper post-processing step.

If this directive is not set, the default behaviour (fallback) is used, i.e. upCast tries to use either uci:inline or uci:block for the grouping element, depending on its position in the tree hierarchy. If this directive is set, the specified qualified name will always be used, independently from the positional hierarchy in the document tree.

The advantage of being able to set a dedicated grouping element name is that looking for it must not use different names based on tree hierarchy position. Also, you can even use more specific grouping elements for certain areas of grouping actions (if you encapsulate those into one UPL tree processor module).

The @uci:type attribute is set on the specified element in any case.

Caveats

If you paint nodes in one module (and set the groupQName option there), but do the actual grouping in a different module, the setting of the module actually performing the grouping is used. If you use a module dedicated only to performing the grouping (on nodes painted elsewhere) and want a custom grouping element, you must make sure you specify the respective #set groupQName option in the UPL code of the actually grouping module!

Example 3.26. 

To have groups created using the element uci:group, use:

#namespace uci "http://www.infinity-loop.de/namespace/2006/upcast-internal";
#set groupQName := uci:group;

This will use <uci:group uci:type="...">...</uci:group> for the grouping element.


13.3. #namespace

To be able to use namespace prefixes in identifiers, you need to bind them to their respective namespace name. This is accomplished usind the #namespace directive, which declares namespaces and their prefixes for use in UPL:

'#namespace' nsprefix '"' nsname '";'

with

nsprefix

the namespace prefix you want to use in indentifiers to refer to the declared namespace. This must be an identifier, except that it may not contain the colon ':' character.

nsname

the namespace name, typically a URI, to bind the namespace prefix to

Common namespace declarations

To access properties in the internal tree created by the RTF Importer module, we recommend defining the following namespace declarations at the top of your UPL code for convenience:

#namespace uci "http://www.infinity-loop.de/namespace/2006/upcast-internal";
#namespace css "http://www.infinity-loop.de/namespace/2006/upcast-css";
#namespace cssc "http://www.infinity-loop.de/namespace/2006/upcast-cssclass";
#namespace csso "http://www.infinity-loop.de/namespace/2006/upcast-cssoverride";

13.4. #include

The #include directive allows including UPL code verbatim from external files. The result of an #include directive is exactly the same as if itself was replaced by the contents of the included file.

The syntax of the #include directive is as follows:

'#include(' 'source:' srcfile ('encoding:' encodingname)? ('encrypted:' cryptkey)? ' ('namespace-mode:' mode)? );'

with

srcfile

the source file specification to the code or file to include. This may be either an absolute path to the file in local file system convention or URL format, a relative path, or a reference to an internal variable.

If the path is relative, it is resolved to the current UPL file's base URI. The base URI of a top-level UPL file (as specified e.g. in an upCast Module) is the base URI of the pipeline document containing that module resp. that top-level UPL code.

You can also use the syntax "upcast:realm:varname[#as-code|#as-file]" to include either code verbatimly, or retrieve the file to include from an upCast internal variable, with:
realm = the upCast realm to fetch from, usually "pipeline"
varname = the name of the variable in the respective realm
#as-code = treat the contents of the variable as the UPL code to use (this is the default when not specified)
#as-file = treat the contents of the variable as the filepath to the file to include

encodingname

is the (optional) Java name of the encoding the file to be included is in. This value is a fallback value for those cases where the encoding can not be automatically detected using a possibly present BOM at the beginning of the file, and if the file to be read does not include a #charset directive at its very beginning.

encrypted

this (optional) parameter indicates whether the file to be read has been encrypted by specifying its decryption key. When the key string is the empty string or less than 30 characters long, an upCast-internal decryption key is used known only to infinity-loop. If the key is 30 characters long, it is used as the decryption key for the file to be included.

namespace-mode

this (optional) parameter indicates whether any namespace declarations in the included UPL file should be considered local to the included file (value: "local"), or added verbatimly to the namespace declarations of the including file (value: "global"). The default value when not specified is "local".

Example 3.27. 

#include( source:"file:///C:/functions.upl" encoding:"UTF-8" );

reads and parses the file C:\functions.upl with UTF-8 encoding (unless the file specifies a different encoding itself) and places its contents at the respective location in the including file.

#include( source:"upcast:pipeline:uplcode#as-code" );

or short:

#include( source:"upcast:pipeline:uplcode" );

reads and parses the text contents of the pipeline variable "uplcode".

#include( source:"upcast:pipeline:uplfile#as-file" );

reads and parses the file whose full, absolute path is stored in the pipeline variable "uplfile".


Tip

You may want to explicitly opt for the "global" value for the namespace-mode option when your intention for including the specified UPL code is to serve as the common UPL header defining the namespaces and corresponding prefixes in one place (and therefore consistently) for many UPL programs in your project. You can also add variable definitions or further #include()s here that you will want to use in any of your UPL programs, like e.g. a library of utility functions.

Example 3.28. Example: Using namespace-mode for consistent namespace declarations

Suppose you have the following file header.upl:

#namespace uci "http://www.infinity-loop.de/namespace/2006/upcast-internal";
#namespace pipeline "http://www.infinity-loop.de/namespace/upcast-realm/pipeline";
#namespace module "http://www.infinity-loop.de/namespace/upcast-realm/module";
#namespace css "http://www.infinity-loop.de/namespace/2006/upcast-css";
#namespace cssc "http://www.infinity-loop.de/namespace/2006/upcast-cssclass";
#namespace csso "http://www.infinity-loop.de/namespace/2006/upcast-cssoverride";
#namespace javaproperty "http://www.infinity-loop.de/namespace/upcast-realm/javaproperty";
#namespace app "http://www.infinity-loop.de/namespace/upcast-realm/application";
#namespace env "http://www.infinity-loop.de/namespace/upcast-realm/environment";
// upCast Utility library:
#namespace util "http://www.infinity-loop.de/namespace/upl/utility-functions";
// Project custom namespaces:
#namespace my "http://www.example.com/my-namespace";
#namespace annot "http://www.example.com/annotations";
#namespace xh "http://www.w3.org/1999/xhtml";
// default grouping element name:
#set groupQName := uci:group; // the element name to use for grouping

, then you could include the above and have all the namespace declarations available and the groupQName option set identically in all your UPL code by simply starting it each with the single line

#include( source:"header.upl" namespace-mode:"global" );

The namespace-mode:"global" setting will effectively add all of the namespace declarations in header.upl to the UPL program that includes that file, as if they had been written out explicitly there. This pattern allows you to quickly add or change namespace declarations for all your UPL code in a project by just editing that single header.upl file.


14. Debugging

14.1. Breakpoints

The code sequence

##l##

is used to set a breakpoint on the subsequent statement.

Example 3.29. 

##l## debug( "after" );

When execution reaches this point in the program, it is suspended and a dialog is shown:

This lets you choose how to proceed:

next line

resumes execution until the next line in source code is reached

step

steps into and through the statements/expressions/calls

terminate

terminates execution of the current program

resume

resumes normal execution of the current program

14.2. Watchpoints

The code sequence

##v#varname##

is used to set a watchpoint on the UPL variable varname (name only, without leading ‘$’). That means that every time the variable $varname is accessed (either for reading or for writing), program execution is suspended and a dialog with choices for how to proceed is shown.

Example 3.30. 

##v#var##

function main() as Numeric {
  variable $var as String := "a";
  $var := "b";
  debug( $var );
  $var := "c";
  debug( $var );
  return 0;
}

In the above, execution is suspended for every statement (except for the return) because in each one, the variable $var is accessed for definition, reading or writing.


When execution is suspended, the following choices are available:

next line

resumes execution until the next line in source code is reached

step

steps into and through the statements/expressions/calls

terminate

terminates execution of the current program

resume

resumes normal execution of the current program

Chapter 4. UPL Tree Processor

The UPL Tree Processor is a language extension to UPL Core. It is designed to run defined actions for a node in upCast's internal document tree when a condition for that node matches. In this sense, it is very similar to CSS' notion of selectors and associated declaration blocks.

This module allows you to specify actions to be taken in a declarative way.

1. Building blocks

A UPL program to be run by the UPL Tree Processor consists of several building blocks.

1.1. UPL Core constructs

A UPL Tree Processor program can contain any constructs of UPL Core.

1.2. Rules

A UPL Tree Processor program typically also contains rules. Rules are a pair consisting of a selector and an action block. The selector specifies a condition the context node must satisfy before the actions specified in the action block are applied to it.

The syntax for a rule is

(label ':')? selectorlist actionblock

where selectorlist is a list of selectors separated by a comma (',')and actionblock is an action block.

Example 4.1. 

The following rule prints the value of the context node's level attribute to the console if the context node is a heading element and has that attribute:

[element(heading) and exists(@level)]
{
  print( @level );
}

The following rule sets the delete attribute to true on inline elements that do not have any textual contents and all span elements whose textual contents is the string "empty":

[element(inline) and string()=''],
[element(span) and string()="empty"]
{
  set-attr( delete, true );
}

1.2.1. Selector

A selector looks very similar to an XPath predicate or predicate list:

('[' expression ']')+

expression can be any UPL expression.

If the selector is true when evaluated on the context node, the selector is considered true. If a selector is true, the actions in the following action block are executed.

Tip

You can think of an UPL selector as a (or possibly multiple) XPath predicate being applied from left to right on a source sequence that contains just the context node. If after applying all predicates that sequence still contains the context node, the action block is executed.

1.2.2. Action block

An action block looks like just an ordinary block in UPL Core:

'{' statements '}'

statements can be any sequence of statements as defined in UPL Core. They are executed on the context node in the order specified when the associated selector yields true. You can think of the action block being the body of a function not taking any explicit parameters and returning a value that is run when the selector matches.

2. Processing model

When the UPL Tree Processor runs, the following steps are executed in order:

2.1. Step 1: Run initialize()

If the UPL Tree Processor program defines a function of the signature

function initialize() as Value

then it is run.

If the function returns the Numeric zero (0), the next step in the processing model is executed, i.e. the tree traversal is started.

If the function returns a non-zero Numeric value, UPL Tree Processor execution is aborted and the following steps are not executed.

2.2. Step 2: Walk the document tree

Next, the document tree is traversed in a depth-first traversal. The document tree is the internal upCast document tree. It is usually constructed in an earlier running importer module like the RTF Importer or the XML Importer.

The starting node of the traversal is the Document node, which is then the first context node. The context node changes during the tree traversal. Many pre-defined functions require that a context node is defined to be useful.

For the context node, the set of rules defined in the UPL Tree Processor program are considered in the same order as they are written (=defined), from top to bottom. The first rule for which the selector expression yields true on the context node is chosen and the action block is executed.

  • If after executing the action block, the internal rule mode variable is set to break, no further rules are considered, but the next node in document order is chosen as the new context node and with this node, the process of rule consideration starts again at the first defined rule.

  • If after executing the action block, the internal rule mode variable is set to continue, the next (and possibly further) rules are considered, still with the same context node. If one of the remaining rules' selector matches, its action block is executed and depending on the state of the internal rule mode variable after that, the corresponding action as described is taken. If no further rule matches, the next node in document order is chosen as the new context node and with this node, the process of rule consideration starts again at the first defined rule.

  • If after executing the action block, the internal rule mode variable is set to exit, no further rules are considered and also the tree traversal ends without considering any further document nodes.

  • If after executing the action block, the internal rule mode variable is set to jump:label, the next rule after the specified label marker (and possibly further ones down) are considered, still with the same context node. If one of the selectors of the rules following label matches, its action block is executed and depending on the state of the internal rule mode variable after that, the corresponding action as described is taken. If no further rule matches, the next node in document order is chosen as the new context node and with this node, the process of rule consideration starts again at the first defined rule.

This process continues until all nodes have been visited.

See also: set-rulemode()

Traversal is sequential and deterministic

The traversal takes place in a defined, deterministic, strictly sequential manner. This is in contrast e.g. to XSLT, where template selection and application is non-deterministic and often parallel (though the result is of course well defined), depending on the implementation.

2.3. Step 3: Run finalize() / error-finalize()

After the tree traversal has finished without any errors, and if it is defined, the UPL Tree Processor tries to call the function

function finalize() as Value

The result of this function is set as the upCast UPL Tree Processor module's result value in the pipeline variable ModuleResult.

If errors occurred during UPL Tree Processor execution, however, it will try to call the function

function error-finalize() as Value

instead. The result of this function is set as the upCast UPL Tree Processor module's result value in the pipeline variable ModuleResult.

Chapter 5. UPL Function reference

1. Type casting functions

1.1. copy

copy(val as Value) as Value

valValue

Creates a deep copy of the passed Value val which is then independent of any later modifications of val.

Note

This is especially useful for List values which are usually modified in-place by the respective functions like append(), remove() etc.

1.2. to-bool

to-bool(value as Value) as Bool

valueValuevalue to convert to a bool

Returns the effective boolean value of its argument.

Note

Calculating the effective boolean value of the argument is very different from casting the argument to a Bool.

The effective boolean value of a Bool is its own value.

The effective boolean value of a Color is always true.

The effective boolean value of an Id is true if the identifier length is greater zero (0), false otherwise.

The effective boolean value of a List is true only if it has at least one item and if the effective boolean value of each of its items is true. (It follows that the effective boolean value of an empty list is false.)

The effective boolean value of a Null is false.

The effective boolean value of a Numeric is false if it is zero (0), true otherwise.

The effective boolean value of a String is true if its length is greater zero (0), false otherwise.

Example 5.1. 

to-bool( "false" )

returns true because the length of the string "false" is greater than 0.

to-bool( { "abc", 0 } )

returns false because even though the effective boolean value of its first argument is true, the effective boolean value of the numeric second element of the list is false. Compare this to

to-bool( { "abc", 3.1415 } )

which returns true because the effective boolean values of all of the list's elements are true.


1.3. to-color

to-color(value as Value) as Color

valueValuevalue to convert to a color

Converts its argument to a Color value. The argument must be a valid CSS 2.1 color value string. Additionally, rgba() from the CSS 3 Color Module is supported.

The Color value of an Id is its value parsed to a color as described above.

The Color value of a String is its value parsed to a color as described above.

If a value cannot be parsed into a color as described above, a TypeConversionException is thrown.

Trying to cast Bool, List, Numeric or Null to a Color throws a TypeConversionException.

Example 5.2. 

to-color( red )

is the same as

to-color( "#F00" )

which is the same as

to-color( "#ff0000" )

which is the same as

to-color( "rgb(255,0,0)" )

which is the same as

to-color( "rgb(100%,0%,0%)" )

which is the same as

to-color( "rgba(255,0,0, 1.0)" )

which all designate the color red.


1.4. to-id

to-id(value as Value) as Id

valueValuevalue to convert to an id

Converts its argument to an Id value.

The Id value of an Id is its own value.

The Id value of a String is its contents.

Trying to cast a Bool, Color, List, Numeric or Null to an Id throws a TypeConversionException.

Example 5.3. 

to-id( "uci:par" )

returns the Id uci:par.


1.5. to-list

to-list(value as Value) as List

valueValuevalue to convert to a list

Converts its argument to a List value.

The list value of a List is itself.

The list value of a Numeric, Color, Id, List, String and Null is a one-element list with value as its element.

1.6. to-null

to-null(value as Value) as Null

valueValuearbitrary value

Converts its argument to a Null value, i.e. it effectively returns the Null value null always.

1.7. to-numeric

to-numeric(value as Value) as Numeric

valueValuevalue to convert to a numeric value

Converts its argument to a Numeric value.

The numeric value of a Bool is 1 (if it is true), or 0 (if it is false).

The numeric value of a Numeric is itself.

The numeric value of a String is its value parsed as a decimal number (either with or without a dimension specification). If the value cannot be parsed, a TypeConversionException is thrown.

For Color, Id, List and Null a TypeConversionException is thrown.

1.8. to-numeric

to-numeric(numstring as Value, radix as Numeric) as Numeric

numstringValuestring representation of numeric value
radixNumericnumbering base to use for interpretation of numstring (2..36)

Interprets numstring to a numerical value by first casting it to a String, then interpreting the result as a number representation in the specified radix.

Returns the number for which numstring is a literal representation in numbering base radix.

When numstring cannot be interpreted as a number with the specified radix, a TypeConversionException is thrown.

Example 5.4. 

to-numeric( "20", 16 )
to-numeric( "40", 8 )
to-numeric( "100000", 2 )

all return a Numeric with value 32 (assuming the decimal numbering system).


See also: to-string()

1.9. to-string

to-string(value as Value) as String

valueValuevalue to convert to a string

Converts its argument to a String value.

The string value of a Bool is "true" or "false", respectively.

The string value of a Color is the hex notation of the represented color as defined in HTML, format: #rrggbb

The string value of an Id is its value.

The string value of a List is the concatenation of its member elements cast to String, with U+0020 (space character) as a separator added between two individual values.

The string value of a Numeric is its human readable representation.

The string value of a String is itself.

Trying to cast Null to a String throws an TypeConversionException.

Example 5.5. 

to-string( 5 )

returns the string "5".

to-string( { "ab", 2, someId } )

returns the string "ab 2 someId".


1.10. to-string

to-string(value as Numeric, radix as Numeric) as String

valueNumericnumeric value (power must be 0)
radixNumericnumbering base to use for conversion ( 2..36)

Converts its Numeric value argument to a String representation in the specified radix (numerical base)

Example 5.6. 

to-string( 32, 16 )

returns the string "20",

to-string( 32, 8 )

returns the string "40", and

to-string( 32, 2 )

returns the string "100000".


See also: to-numeric()

2. Functions on Colors

2.1. get-color-component

get-color-component(color as Color, component as Id) as Numeric

colorColorcolor value
componentIdRED | GREEN | BLUE | ALPHA

Returns an individual color component value of the Color value passed as argument color. The component to return is determined by the component parameter, which can take the Id values RED, GREEN, BLUE and ALPHA.

The Numeric value returned is between 0 and 255 for the components RED, GREEN and BLUE, and between 0.0 (=fully transparent) and 1.0 (opaque) for ALPHA.

Example 5.7. 

get-color-component( to-color( red ), RED )

returns 255

get-color-component( to-color( red ), ALPHA )

returns 1.0

get-color-component( to-color( "#123456" ), GREEN )

returns 52


3. Date & Time functions

3.1. current-dateTime

current-dateTime() as String

Returns the current date and time in form of an ISO date string.

Example 5.8. 

The code

current-dateTime()

will for example return a String similar to (assuming the executing machine is located in Munich (Germany), and it's daylight savings time in effect):

"2008-07-31T23:49:22.599+02:00"

3.2. format-dateTime

format-dateTime(dateTime as String, format as String) as String

dateTimeStringdate and time as String in ISO format
formatStringformatting string suitable for use in java.text.SimpleDateFormat

This returns a formatted ISO date string. dateTime must be an ISO date and time string. format is a formatting string for the individual components in the passed ISO date string.

The formatting options are the same as for the Java class java.text.SimpleDateFormat – please see there for details.

Example 5.9. 

The code

format-dateTime( "2008-07-31T23:44:20.923Z", "h:mm a")

will return the String

"11:44 PM"

3.3. format-dateTime

format-dateTime(dateTime as String, component as Id) as Numeric

dateTimeStringdate and time as String in ISO format
componentIdextract the specified component of a dateTime string, one of: YEAR, MONTH, DAY_OF_MONTH, DAY_OF_YEAR, DAY_OF_WEEK, HOUR_OF_DAY, MINUTE, SECOND, MILLISECOND

Retuns the specified component from a formatted ISO date string. dateTime must be an ISO date and time string. component is the selector for the desired component, and can be one of the following:

YEAR

number of year (Gregorian Calendar)

MONTH

number of month (1 = January, 12 = December)

DAY_OF_MONTH

number of the day in the month (1..31)

DAY_OF_YEAR

number of the day within the year (1..366)

DAY_OF_WEEK

number of the day in the current week (1 = Sunday, 2 = Monday, ... 7 = saturday)

HOUR_OF_DAY

hour in the day (0..23)

MINUTE

minute in the hour (0..59)

SECOND

second in the minute (0..59)

MILLISECOND

millisecond in the second (0..999)

Note

The values returned are the values after normalizing dateTime to UTC.

Example 5.10. 

The code

format-dateTime( "2008-07-31T23:49:22.599+02:00", HOUR_OF_DAY )

will return the Numeric

21

4. File functions

4.1. add-to-zip-archive

add-to-zip-archive(zipfile as String, baseFolder as String, contents as List) as Void

zipfileStringthe zip file to add to (if it already exists)
baseFolderStringthe base folder to calculate relative entry paths from
contentsListdescription of ZIP file contents

This function lets you add files or complete folders to an (possibly) existing ZIP file.

The parameter zipfile specifies the absolute path of the ZIP file to add files or folders to.

baseFolder specifies the base folder with respect to which any items' paths within the zip file are calculated. You can only add items that are located under this base folder.

contents is a List-based data structure that identifies the items to be added to the zip archive, and with which options. The structure of contents is a List of item specifications, where each item specification itself is a List and has the following structure:

{ itempath, { {optionname1, optionval1}, {optionname2, optionval2}, … } }

with:

itempath

the absolute path to the item to be included in the zip archive. This must be located under baseFolder. The path may designate a folder or a single file.

optionname

the name of an option for that item. See below for available options.

optionval

the value for the respective option. See below for available option names.

Available options

Option Name

Values

METHOD

Specifies the compression method.

STORED

the item is just stored in the archive, but not compressed. This may be useful for container formats that require an uncompressed items as their first entry like for EPUB (OPF format).

DEFLATED

the item will be compressed into the archive

PREFIX

When specified, the relative entry name calculation based on baseFolder is suppressed. Instead, the specified prefix String is prepended to the item's name.

Example 5.11. 

Assuming baseFolder has the value "/docs/", and the folder /docs/feb/ conatining files marketing.doc and techspec.doc, then adding the following itempath with the respective prefix will result in the shown zip entries:

itempath

PREFIX

zip entry

/docs/feb/marketing.doc

"press/"

press/marketing.doc

/docs/feb/marketing.doc

""

marketing.doc

/docs/feb/marketing.doc

"press"

pressmarketing.doc

/docs/feb/marketing.doc

not specified

feb/marketing.doc

/docs/feb/

"press/"

press/marketing.doc
press/techspec.doc

/docs/feb/

""

marketing.doc
techspec.doc

/docs/feb/

"press"

pressmarketing.doc
presstechspec.doc

/docs/feb/

not specified

feb/marketing.doc
feb/techspec.doc


SEARCHFLAGS

Allows you to define whether when specifying a folder entry, its contents is recursively added (files and folders) or only its files, not recursing into subfolders. Additionally, it lets you specify whether hidden files should be included in the archive or not.

F

include files

D

recurse into subfolders

H

include hidden files or folders

The desired flags must be concatenated (in any order), and must be written in capital letters as shown.

The default value for SEARCHFLAGS when not specified is "FD", i.e. descend into subfolders, collecting all files and folders recursively, but do not include hidden files.

For an itempath "/docs/feb/", specifying "FH" for SEARCHFLAGS will include any files (visible or hidden) in the folder /docs/feb/, but will not recursively descend into and include any folders that might be in /docs/feb/.

When adding an itempath to a zip archive whose resulting zip entry or entries already exist in the archive, the existing entry or entries are replaced by the current data of the newly added itempath (refresh).

When zipfile does not yet exist, it is created.

The function will throw an EvalException when creating or adding to zip archives fails for some reason (indicated in the exception's message).

Important

This function can neither create, nor add to encrypted zip archives.

Example 5.12. 

With the files and folders as described in the description of the PREFIX parameter from above,

add-to-zip-archive( "/dist/files.zip", "/docs/",
  {
    { "/docs/feb/marketing.doc",
      { { PREFIX, "press/" } }
    },
    { "/docs/feb/techspec.doc",
      { { METHOD, STORED } }
    }
  } );

will add the zip entries

  • press/marketing.doc

  • feb/techspec.doc

to the existing zipfile /dist/files.zip, with the techspec.doc stored in the zip file without compression.

If the zip file does not yet exist at that location, it is created.


Example 5.13. Packaging an EPUB file (OCF)

Suppose you created the desired file layout of your EPUB file under /epub/, a typical function call to package that file into a my.epub file would be:

create-zip-archive( "/my.epub", "/epub/",
  {
    { "/epub/mimetype",
      { { METHOD, STORED } } // first item in archive, no compression
    },
    { "/epub/META-INF/",
      { { METHOD, DEFLATED } }
    },
    { "/epub/OEBPS/",
      { { METHOD, DEFLATED } }
    }
  } );

4.2. add-to-zipfile (deprecated) 

add-to-zipfile(zipfile as String, pathPrefix as String, fileOrFolder as String) as Void

This function is deprecated and should no longer be used. It will be removed in a subsequent release.

For a replacement, use add-to-zip-archive().

4.3. copy-file

copy-file(fromFile as String, toFile as String) as Bool

fromFileStringabsolute path to file
toFileStringabsolute path to file

This method copies the file fromFile to the file toFile. Returns true when the copy was successful, false otherwise. The paths need to be absolute, but can be specified either using URL notation (preferred for platform independence) or in local file system naming convention.

4.4. create-zip-archive

create-zip-archive(zipfile as String, baseFolder as String, contents as List) as Void

zipfileString
baseFolderString
contentsList

This function is identical to add-to-zip-archive() with the only exception that items are not added to an existing zip file, but that the zip file is created fresh from only the contents descriptions. A possibly existing zipfile is overwritten by the new contents.

For the full description, please see add-to-zip-archive() .

4.5. create-zipfile (deprecated) 

create-zipfile(zipfile as String) as Void

This function is deprecated and should no longer be used. It will be removed in a subsequent release.

For a replacement, use create-zip-archive().

4.6. delete-file

delete-file(filepath as String) as Bool

filepathStringabsolute path to file

Deletes the file with absolute path filepath.

4.7. extract-zip-archive

extract-zip-archive(zipArchive as String, destFolder as String) as Void

zipArchiveStringthe zip archive to extract
destFolderStringthe folder to extract to

Extracts the specified, ZIP-compressed file zipArchive to the folder destFolder.

Password protection / encryption is not supported.

It throws an EvalException when extraction fails for some reason (indicated in the exception's message).

Example 5.14. 

extract-zip-archive( "test.zip", "/tmp/" );

extracts the contents of the ZIP file test.zip to the folder /tmp .

extract-zip-archive( "report.docx", "/reports/" );

extracts the the Word 2007 file report.docx (which actually is a ZIP-ped folder hierarchy of individual files) to the folder /reports where you then could extract individual components like contained image files from.


4.8. extract-zipfile (deprecated) 

extract-zipfile(zipfile as String, destFolder as String) as Numeric

This function is deprecated and should no longer be used. It will be removed in a subsequent release.

For a replacement, use extract-zip-archive().

4.9. file-exists (deprecated) 

file-exists(file as String) as Bool

This function is deprecated and should no longer be used. It will be removed in a subsequent release.

For a replacement, use fs-exists().

4.10. fs-copy

fs-copy(mode as Id, src as String, dest as String) as Bool

modeIdFILE-TO-FILE | FILE-TO-FOLDER | FOLDER-TO-FOLDER | CONTENTS-TO-FOLDER
srcStringabsolute file path
destStringabsolute file path

This function lets you copy file system objects (files or complete folders). You can choose from several modes specifying how the copy operation should be performed and how the passed absolute paths src and dest are to be interpreted.

FILE-TO-FILE

copies the file src to the file specified by dest. Note that dest must be a full filename path, not a path to just the folder where the file should wind up. Use FILE-TO-FOLDER for this. The advantage of this mode is that during the copy, you can rename the file. If the destination file already exists, it is silently overwritten.

FILE-TO-FOLDER

copies the file src into the folder dest, keeping its original name. If a file system object by that name already exists in dest, it is silently overwritten.

FOLDER-TO-FOLDER

recursively copies the folder src into the folder dest, creating it and any intermediate folders . If a file system object by that name already exists in dest, it is silently overwritten.

CONTENTS-TO-FOLDER

recursively copies all file system objects within src into the folder specified by dest (creating it and any intermediate folders when necessary), replacing any existing objects that might already exist. The folder src will not be deleted.

Example 5.15. 

fs-copy( FILE-TO-FILE, "/usr/dev/draft.rtf", "/usr/project/final.rtf" );

copies the file draft.rtf into the project folder under the name final.rtf

fs-copy( FILE-TO-FOLDER, "/usr/dev/manual.rtf", "/usr/dist/doc/" );

copies the file manual.rtf into the doc folder

fs-copy( FOLDER-TO-FOLDER, "/usr/dev/doc", "/usr/dist" );

copies the complete doc folder into the dist folder

fs-copy( CONTENTS-TO-FOLDER, "/usr/dev/doc", "/usr/dist/documentation" );

copies the contents of the doc folder into the documentation folder


4.11. fs-create

fs-create(type as Id, abspath as String) as Bool

typeIdFILE | FOLDER | FILE-REPLACE | FOLDER-REPLACE
abspathStringabsolute file or folder path

This function lets you create a new file system object, i.e. a new empty file or a folder.

Parameter type specifies what to create using which semantics:

FILE

creates a new, empty file at the absolute file location specified by abspath if it does not already exist. If creation was successful or the file already exists at that location, true is returned, otherwise the result is false. When a file at that location already exists, it is not modified in any way, meaning that data already present in the file is not cleared.

FOLDER

creates a new, empty folder at the absolute location specified by abspath if it does not already exist. If creation was successful or the folder already exists at that location, true is returned, otherwise the result is false. When a folder at that location already exists, it is not modified in any way, meaning that file or folder contents in it is not deleted.

FILE-REPLACE

similar to FILE, but deletes any existing object at that location beforehand, regardless whether it is a file or a folder (including all of its contents!). Use with caution!

FOLDER-REPLACE

similar to FOLDER, but deletes any existing object at that location beforehand, regardless whether it is a file or a folder (including all of its contents!). Use with caution!

4.12. fs-create

fs-create(type as Id, link as String, target as String) as Bool

typeIdSYMLINK
linkStringpath to create the link at
targetStringpath to create the link to

System requirements

This function is currently only available when running upCast on:

  • Windows Vista (or newer)

  • Mac OS X

This function lets you create special file system objects in the file system.

Parameter type specifies what to create using which semantics:

SYMLINK

creates a symbolic link in the file system, with link specifying the absolute path of the link to create, and target being the (already existing) file or folder to have the symbolic link point to.

4.13. fs-delete

fs-delete(mode as Id, fsobject as String) as Bool

modeIdSELF | CONTENTS
fsobjectStringabsolute file path

This function deletes a file system object (file or complete folder). You can choose from several modes specifying how the deletion operation should be performed and how the passed absolute path fsobject is to be interpreted.

SELF

deletes the file system object fsobject. This can be a file or a folder, in which case the deletion operation is performed recursively on its contents before it is deleted itself.

CONTENTS

recursively deletes all file system objects within the folder specified by fsobject, leaving you with an empty fsobject folder.

Example 5.16. 

fs-delete( SELF, "/usr/dev/draft.rtf" );

deletes the file draft.rtf

fs-delete( SELF, "/usr/dist/doc/" );

deletes the folder doc, including all of its contents

fs-delete( CONTENTS, "/usr/dev/doc/" );

deletes all file system objects (files or folders) within the doc folder, resulting in a now empty doc folder.


4.14. fs-exists

fs-exists(fsobjPath as String) as Bool

fsobjPathStringabsolute path to file system object (file or folder)

This method checks whether the specified fsobjPath (which may be a file or a folder) exists in the local file system. The path must be absolute, but can be specified either using URL notation (preferred for platform independence) or in local file system naming convention.

4.15. fs-exists

fs-exists(fsobjPath as String, mode as Id) as Bool

fsobjPathStringabsolute path to file system object (file or folder)
modeIdNATIVE | CASE-SENSITIVE

This method checks whether the specified fsobjPath (which may be a file or a folder) exists in the local file system. The path must be absolute, but can be specified either using URL notation (preferred for platform independence) or in local file system naming convention.

You can choose by which mode the check for existence should be performed:

NATIVE

the file system's native check for existence is used. On case-insensitive file systems, this means that character case might not match with the passed value, i.e. a check for "C:\Test.doc" will return true on a standard Windows installation, even when the actual file in that location is named "C:\test.doc" or "C:\TEST.DOC".

CASE-SENSITIVE

the check is performed case-sensitively, regardless of the underlying file system being actually case-sensitive or not. This means that a check for "C:\Test.doc" will return false on a standard Windows installation when the actual file in that location is named "C:\test.doc".
Note that when you later create a file "C:\Test.doc", the existing file "C:\test.doc" will be overwritten in this case!

4.16. fs-info

fs-info(path as String) as List

pathStringabsolute path to the file system object to deliver info on

This function returns properties for the passed absolute file system object path as a map-like List with two-element Lists as elements, each of which represents a key-value pair.

The following keys may be present in the returned list:

Key

Type

Description

ACCESS_TIME

String

the time of last access in ISO 8601 format: YYYY-MM-DDThh:mm:ss[.s+]Z

CREATION_TIME

String

the time of creation in ISO 8601 format: YYYY-MM-DDThh:mm:ss[.s+]Z

EXECUTABLE

Bool

true when this is an executable, false otherwise

EXISTS

Bool

true when this file actually exists, false otherwise

GROUP_NAME

String

the name of the file group (POSIX)

HIDDEN

Bool

true when this file has set the hidden flag, false otherwise (POSIX)

LOGICAL_SIZE

Numeric

the logical size in bytes

MODIFICATION_TIME

String

the time of last modification in ISO 8601 format: YYYY-MM-DDThh:mm:ss[.s+]Z

OWNER_NAME

String

the name of the file owner (POSIX)

READABLE

Bool

true when this file is readable by the application, false otherwise

SIZE

Numeric

the logical size in bytes

TYPE

String

one of: "file", "folder", "symbolic-link"

WRITABLE

Bool

true when this file is writable, false otherwise

Note that the presence of any of the keys listed above in the result is not guaranteed. If a key is not present, its associated value could not be determined.

Example 5.17. 

fs-info( "/images/flower.jpg" )

might return:

{
  {MODIFICATION_TIME,"2011-12-17T17:31:07Z"},
  {CREATION_TIME,"2011-12-17T17:31:07Z"},
  {EXISTS,true},
  {EXECUTABLE,false},
  {ACCESS_TIME,"2015-07-06T14:23:28Z"},
  {READABLE,true},
  {SIZE,307605},
  {HIDDEN,false},
  {GROUP_NAME,"staff"},
  {LOGICAL_SIZE,307605},
  {WRITABLE,true},
  {OWNER_NAME,"chris"},
  {TYPE,"file"}
}

4.17. fs-info

fs-info(path as String, key as Id) as Value

pathStringabsolute path to the file system object to deliver info on
keyIdone of: SIZE | LOGICAL_SIZE | OWNER_NAME | GROUP_NAME | READABLE | WRITABLE | EXECUTABLE | CREATION_TIME | MODIFICATION_TIME | ACCESS_TIME | TYPE | HIDDEN | EXISTS

This function returns the value of property key for the passed absolute file system object path.

The following keys may be queried on a file system object:

Key

Type

Description

ACCESS_TIME

String

the time of last access in ISO 8601 format: YYYY-MM-DDThh:mm:ss[.s+]Z

CREATION_TIME

String

the time of creation in ISO 8601 format: YYYY-MM-DDThh:mm:ss[.s+]Z

EXECUTABLE

Bool

true when this is an executable, false otherwise

EXISTS

Bool

true when this file actually exists, false otherwise

GROUP_NAME

String

the name of the file group (POSIX)

HIDDEN

Bool

true when this file has set the hidden flag, false otherwise (POSIX)

LOGICAL_SIZE

Numeric

the logical size in bytes

MODIFICATION_TIME

String

the time of last modification in ISO 8601 format: YYYY-MM-DDThh:mm:ss[.s+]Z

OWNER_NAME

String

the name of the file owner (POSIX)

READABLE

Bool

true when this file is readable by the application, false otherwise

SIZE

Numeric

the logical size in bytes

TYPE

String

one of: "file", "folder", "symbolic-link"

WRITABLE

Bool

true when this file is writable, false otherwise

When the requested value cannot be determined from the underlying file system, null is returned.

Example 5.18. 

fs-info( "/images/flower.jpg", SIZE )

might return:

307605

as a value of type Numeric and is the logical file size of "/images/flower.jpg" in bytes.


4.18. fs-move

fs-move(mode as Id, src as String, dest as String) as Bool

modeIdFILE-TO-FILE | FILE-TO-FOLDER | FOLDER-TO-FOLDER | CONTENTS-TO-FOLDER
srcStringabsolute file path
destStringabsolute file path

This function lets you move file system objects (files or complete folders) from one location to another. You can choose from several modes specifying how the move operation should be performed and how the passed absolute paths src and dest are to be interpreted.

FILE-TO-FILE

moves the file src to the file specified by dest. Note that dest must be a full filename path, not a path to just the folder where the file should wind up. (Use FILE-TO-FOLDER for this for the latter.) The advantage of this mode is that during the move, you can rename the file. If the destination file already exists, it is silently overwritten.

FILE-TO-FOLDER

moves the file src into the folder dest, keeping its original name. If a file system object by that name already exists in dest, it is silently overwritten.

FOLDER-TO-FOLDER

moves the folder src into the folder dest, creating it and any intermediate folders if necessary. If a file system object by that name already exists in dest, it is silently overwritten.

CONTENTS-TO-FOLDER

moves all file system objects within src into the folder specified by dest (creating it and any intermediate folders when necessary), replacing any existing objects that might already exist. The folder src will not be deleted.

Example 5.19. 

fs-move( FILE-TO-FILE, "/usr/dev/draft.rtf", "/usr/project/final.rtf" );

moves the file draft.rtf into the project folder under the (new) name final.rtf

fs-move( FILE-TO-FOLDER, "/usr/dev/manual.rtf", "/usr/dist/doc/" );

moves the file manual.rtf into the doc folder

fs-move( FOLDER-TO-FOLDER, "/usr/dev/doc", "/usr/dist" );

moves the complete doc folder into the dist folder

fs-move( CONTENTS-TO-FOLDER, "/usr/dev/doc", "/usr/dist/documentation" );

moves the contents of the doc folder into the documentation folder


4.19. get-image-information

get-image-information(imagefile as String) as List

imagefileStringabsolute path to image file to examine

This function returns image properties for imagefile in a map-like List with two-element Lists as elements, each of which represents a key-value pair.

The following keys may be present in the returned list:

Key

Type

Description

ERROR

String

the error message when an error occurred while gathering image info

HRES

Numeric (number)

horizontal resolution as specified in the image in dots per inch

When the image does not specify a horizontal resolution or it cannot be derived from absolute width and width in pixels, the value 0 is returned. In this case, you may want to assume a default resolution, e.g. 72dpi.

VRES

Numeric (number)

vertical resolution as specified in the image in dots per inch

When the image does not specify a vertical resolution or it cannot be derived from absolute height and height in pixels, the value 0 is returned. In this case, you may want to assume a default resolution, e.g. 72dpi.

ABSW

Numeric (length)

absolute width of the image

ABSH

Numeric (length)

absolute height of the image

PIXW

Numeric (number)

width of the image's pixmap in pixels

PIXH

Numeric (number)

height of the image's pixmap in pixels

URL

String

the full URL path to the examined image

FORMAT

String

the image type, one of: "unknown", "gif", "jpg", "png", "pict", "emf", "wmf"

SIZE

Numeric (number)

the size of the image file in bytes

HASH-MD5

String

the MD5 hash of the image file

Note that the presence of any of the keys listed above in the result is not guaranteed. If a key is not present, its associated value could not be determined.

Example 5.20. 

get-image-information( "/images/flower.jpg" )

might return:

{
  {URL,"file:///images/flower.jpg"},
  {SIZE,307605},
  {FORMAT,"jpg"},
  {HASH-MD5,"714e2fc5621938cf6864e56959ce3759"},
  {PIXW,640},
  {PIXH,480},
  {ABSW,12800tw},
  {ABSH,9600tw},
  {HRES,72}
  {VRES,72},
}

4.20. get-path-component

get-path-component(path as String, component as Id) as String

pathStringa string to be interpreted as file path
componentIdLOCAL | URL | LOCALPATH | URLPATH | LOCALNAME | URLNAME | LOCALEXTENSION | URLEXTENSION | LOCALBASENAME | URLBASENAME | LOCALBASENAMEPATH | URLBASENAMEPATH

This method can extract a certain component of a file path. The component can be one of:

LOCAL return the value of the variable in local file system format

URL return the value of the variable in URL format

LOCALPATH return only the path component (without filename and without trailing file separator) of the value of the variable. If the variable is a folder, the value is returned unchanged.

URLPATH same as localpath, but returns the value in URL format

LOCALNAME returns only the file name component of the variable value in local format

URLNAME same as localname, but returns the value in URL format

LOCALEXTENSION returns the file extension of the variable value in local format or the empty string, if it hasn't an extension

URLEXTENSION same as localextension, but returns the value in URL format

LOCALBASENAME returns the same value as localname, but with trailing dot and extension stripped if it exists

URLBASENAME same as localbasename, but returns value in URL format

LOCALBASENAMEPATH essentially, this is localpath + localbasename, i.e. the value of the variable minus extension (including trailing dot)

URLBASENAMEPATH same as localbasenamepath, but returns value in URL format

Example 5.21. 

Calls to get-path-component() will have the following results (as String):

get-path-component( “C:\Documents and Settings\upCast\The file.xml”, LOCAL )

C:\Documents and Settings\upCast\The file.xml

get-path-component( “C:\Documents and Settings\upCast\The file.xml”, URL )

file:///C:/Documents%20and%20Settings/upCast/The%20file.xml

get-path-component( “C:\Documents and Settings\upCast\The file.xml”, LOCALPATH )

C:\Documents and Settings\upCast

get-path-component( “C:\Documents and Settings\upCast\The file.xml”, URLPATH )

file:///C:/Documents%20and%20Settings/upCast

get-path-component( “C:\Documents and Settings\upCast\The file.xml”, LOCALNAME )

The file.xml

get-path-component( “C:\Documents and Settings\upCast\The file.xml”, URLNAME )

The%20file.xml

get-path-component( “C:\Documents and Settings\upCast\The file.x m l”, LOCALEXTENSION )

x m l

get-path-component( “C:\Documents and Settings\upCast\The file.x m l”, URLEXTENSION )

x%20m%20l

get-path-component( “C:\Documents and Settings\upCast\The file.xml”, LOCALBASENAME )

The file

get-path-component( “C:\Documents and Settings\upCast\The file.xml”, URLBASENAME )

The%20file

get-path-component( “C:\Documents and Settings\upCast\The file.xml”, LOCALBASENAMEPATH )

C:\Documents and Settings\upCast\The file

get-path-component( “C:\Documents and Settings\upCast\The file.xml”, URLBASENAMEPATH )

file:///C:/Documents%20and%20Settings/upCast/The%20file


4.21. is-file

is-file(path as String) as Bool

pathStringabsolute path to file system object

Tests if the object found under the specified path is a file. The object must exist for the result to be correct.

4.22. is-filetype

is-filetype(type as Id, file as String) as Bool

typeIdDOC | DOCX | RTF | WORD
fileStringabsolute file path

Returns whether file is of the specified type or not. This function looks into the file's content for making the best possible decision, i.e. it does not rely on the file extension.

When file does not exist, false is returned.

When an unsupported type is specified, an exception is thrown.

The following values for type are supported:

DOC

test if file is a Microsoft Word *.doc binary file

DOCX

test if file is a Microsoft Word *.docx or *.docm file

RTF

test if file is an RTF (Rich Text Format) file (most often with an *.rtf extension)

WORD

test if file is a Microsoft Word file, i.e. either an RTF, DOC or DOCX file. This is a shortcut for (is-filetype(DOC, $file) or is-filetype(DOCX, $file) or is-filetype(RTF, $file)).

Example 5.22. 

is-filetype( DOC, "/test/somefile.ext" )

returns true when somefile.ext is a Microsoft Word binary file in the .doc file format, false otherwise.


4.23. is-folder

is-folder(path as String) as Bool

pathStringabsolute path to file system object

Tests if the object found under the specified path is a folder. The object must exist for the result to be correct.

4.24. list-files

list-files(baseDir as String) as List

baseDirStringabsolute path to base directory

This method generates a list of all flat files (i.e., only the direct file children, no directories, only files that are visible) within baseDir in the file system hierarchy.

Each list element contains the absolute path to a found file as an URL string (file protocol).

4.25. list-files

list-files(baseDir as String, flags as Id) as List

baseDirStringabsolute path to base directory
flagsIdF | D | H | FD | FH | DH | FDH

This method generates a list of all file system objects (i.e., only direct children) within baseDir in the file system hierarchy.

Each list element contains the absolute path to a found file system object as an URL string (file protocol).

You must specify the search algorithm using the additional flags parameter by concatenating the desired flags to an Id (in any order):

F

include file objects

D

include folder (directory) objects

H

include hidden objects

Example 5.23. 

list-files( "/Users/test/", HF );

creates a list of all files in /Users/test/, including hidden files.

list-files( "/Users/test/", D );

creates a list of all folders in /Users/test/.

list-files( "/Users/test/", FDH );

creates a list of all files and folders in /Users/test/, including hidden files or folders.


4.26. list-files-recursively

list-files-recursively(baseDir as String) as List

baseDirStringabsolute path to base directory

This method generates a list of all visible flat files that are descendants of baseDir in the file system hierarchy.

Each list element contains the absolute path to a found file as an URL string (file protocol). Invisible files or folders are neither included in the list nor traversed.

4.27. list-files-recursively

list-files-recursively(baseDir as String, flags as Id) as List

baseDirStringabsolute path to base directory
flagsIdF | D | H | FD | FH | DH | FDH

This method generates a list of all file system objects that are descendants of baseDir in the file system hierarchy.

Each list element contains the absolute path to a found file system object as an URL string (file protocol).

You must specify the search algorithm using the additional flags parameter by concatenating the desired flags to an Id (in any order):

F

include file objects

D

include folder (directory) objects

H

include hidden objects

Example 5.24. 

list-files-recursively( "/Users/test/", HF );

creates a list of all file descendants of /Users/test/, including hidden files and traversing hidden folders.

list-files( "/Users/test/", D );

creates a list of all descendant folders of /Users/test/.

list-files( "/Users/test/", FDH );

creates a list of all descendant files and folders under /Users/test/, including hidden files or folders, and traversing hidden folders.


4.28. move-file

move-file(src as String, dest as String) as Bool

srcStringabsolute file path
destStringabsolute file path

Moves the file src to a new location dest. This can be used to do both, actually move a file to some other location or just rename it.

4.29. read

read(filename as String, encoding as String) as String

filenameStringabsolute file path
encodingStringencoding used in file, e.g. "UTF-8"

This method allows you to read the contents of the specified file (absolute path in filename) and return it as a String.

You also must set the encoding used in the file to be read (e.g. "UTF-8") so that the contents can correctly be put into a Unicode-based UPL String. Supported values for encoding are the encodings supported by the Java runtime where the application runs on (see the respective JVM's documentation for a comprehensive list).

Example 5.25. 

variable $contents as String := "";
$contents := read( "file:/C:/readme.txt", "iso-8859-1" );

reads the contents of the file C:\readme.txt (assuming it is stored in ISO 8859-1 encoding) and puts it as String into contents.


4.30. readln

readln(filename as String, encoding as String) as List

filenameStringabsolute file path
encodingStringencoding used in file, e.g. "UTF-8"

This method allows you to read the contents of the specified text file (absolute path in filename) and return its lines as a List of Strings.

A line is considered terminated by any one of a line feed ('\n', U+000A), a carriage return ('\r', U+000D), or a carriage return followed immediately by a linefeed.

You also must set the encoding used in the file to be read (e.g. "UTF-8") so that the contents can correctly be put into a Unicode-based UPL String. Supported values for encoding are the encodings supported by the Java runtime where the application runs on (see the respective JVM's documentation for a comprehensive list).

Example 5.26. 

variable $lines as List := {};
$lines := readln( "file:/C:/properties.txt", "iso-8859-1" );

reads the contents of the file C:\properties.txt (assuming it is stored in ISO 8859-1 encoding) and stores each line of text as String into the List lines.


4.31. relativize-uri

relativize-uri(absuri as String, baseuri as String) as String

absuriStringabsolute URI
baseuriStringbase URI

Relativizes the passed absolute uri in absuri against the passed baseuri (if possible). When the passed absuri cannot be made relative to baseuri, it is returned unchanged.

This function only operates on the passed values as strings, it does not dereference the URI nor does it check the result for existence.

Example 5.27. 

relativize-uri( "file:/Users/johndoe/memo.txt", "file:/Users/johndoe/" )

will return "file:memo.txt".

relativize-uri( "file:/Users/johndoe/memo.txt", "file:/tmp/" )

will return "file:../Users/johndoe/memo.txt".

relativize-uri( "file:/Users/johndoe/memo.txt", "http://www.upcast.de/" )

will return "file:/Users/johndoe/memo.txt".


4.32. resolve-uri

resolve-uri(uri as String, baseuri as String) as String

uriStringabsolute or relative URI to resolve
baseuriStringbase URI

Resolves the (possibly) relative URI uri against the absolute URI baseuri to an absolute URI as result.

If uri is an absolute URI reference, it is returned unchanged.

This function only operates on the passed values as strings, it does not dereference the URI nor does it check the result for existence.

Example 5.28. 

resolve-uri( "memo.txt", "file:/Users/johndoe/" )

will return "file:/Users/johndoe/memo.txt".

resolve-uri( "file:/Users/johndoe/memo.txt", "file:/Users/melissa/" )

will return "file:/Users/johndoe/memo.txt".


4.33. resolve-uri

resolve-uri(uri as String) as String

uriStringabsolute or relative URI to resolve

Resolves the (possibly) relative URI uri against the base URI of the current pipeline or parameter document. The following equivalence holds:

resolve-uri( "memo.txt" ) = resolve-uri( "memo.txt", $pipeline:ParamBase )

If uri is an absolute URI reference, it is returned unchanged.

This function only operates on the passed values as strings, it does not dereference the URI nor does it check the result for existence.

See also: resolve-uri()

4.34. sourcefile-uri

sourcefile-uri() as String

This function returns the base URL of the UPL source code file from which the function is called.

For example, any #include(…) directives in the same source file will be resolved relatively to this base URL by the UPL scanner.

Example 5.29. 

In a UPL source code file /code/functions.upl, the call

sourcefile-uri()

will return "file:/code/functions.upl".


4.35. write

write(filename as String, encoding as String, mode as Id, data as String) as Void

filenameStringabsolute file name
encodingStringcharacter encoding, e.g. "UTF-8"
modeIdWRITE | APPEND
dataStringthe data to write

This method allows you to write its data argument to the specified file with name filename. You can set the encoding to be used (e.g. "UTF-8"), and set the writing mode to either replace the existing content in that file (WRITE) or get appended at the end (APPEND).

4.36. writeln

writeln(filename as String, encoding as String, mode as Id, data as String) as Void

filenameStringabsolute file name
encodingStringcharacter encoding, e.g. "UTF-8"
modeIdWRITE | APPEND
dataStringthe data to write

This method allows you to write its data argument to the specified file with name filename. The platform-specific line separator code sequence is automatically appended.

You can set the encoding to be used (e.g. "UTF-8"), and set the writing mode to either replace the existing content in that file (WRITE) or get appended at the end (APPEND).

5. Grouping Functions

5.1. is-painted

is-painted(paintColor as Id) as Bool

paintColorIdpaint id

Returns true when the context node is painted with the specified paintColor.

Important

This function does not report whether a painter of the respective color is set on that node, it only reports whether the node is already painted. A painting can only be achieved within an UPL execution by the methods paint-adjacent(), paint-following(), paint-preceding(), or set-painter() if ond only if its list of specified painter types contained the type "this".

5.2. mark-end

mark-end(target as String, paintColor as Id) as Numeric

targetStringXPath selecting the nodes to place an end marker on
paintColorIdthe paint id

This function places an end marker of the specified paintColor on all nodes selected by the target XPath 1.0 expression (evaluated relative from the context node).

The function returns the number of nodes the target expression selected.

For details, see the section on Painters in the upCast Manual.

Example 5.30. 

mark-start( "//uci:par[@uci:class='Last_Name']", name );

will place an end marker of color name on all paragraphs that have an assigned paragraph style of name "Last_Name".


5.3. mark-end

mark-end(paintColor as Id) as Void

paintColorIdthe paint id

This function places an end marker of the specified paintColor on the context node.

For details, see the section on Painters in the upCast Manual.

Example 5.31. 

mark-end( address );

will place an end marker of color address on the context node.


5.4. mark-start

mark-start(paintColor as Id) as Void

paintColorIdthe paint id

This function places a start marker of the specified paintColor on the context node.

For details, see the section on Painters in the upCast Manual.

Example 5.32. 

mark-start( address );

will place a start marker of color address on the context node.


5.5. mark-start

mark-start(target as String, paintColor as Id) as Numeric

targetStringXPath selecting the nodes to place a start marker on
paintColorIdthe paint id

This function places a start marker of the specified paintColor on all nodes selected by the target XPath 1.0 expression (evaluated relative from the context node).

The function returns the number of nodes the target expression selected.

For details, see the section on Painters in the upCast Manual.

Example 5.33. 

mark-start( "//uci:par[@uci:class='First_Name']", name );

will place a start marker of color name on all paragraphs that have an assigned paragraph style of name "First_Name".


5.6. paint-adjacent

paint-adjacent(paintColor as Id, siblingCount as Numeric) as Numeric

paintColorIdthe paint id
siblingCountNumericmaximum number nodes to paint (context node and at max (siblingCount - 1) following siblings)

For siblingCount >= 0:

The function paints the context node and (siblingCount – 1) following sibling nodes with the specified paintColor.

The function places a start marker of the specified paintColor on the context node, and an end marker of the specified paintColor on the last node (in following-sibling axis) that was painted.

For siblingCount < 0:

The function paints the context node and ( abs(siblingCount) – 1) preceding sibling nodes with the specified paintColor.

The function places an end marker of the specified paintColor on the context node, and a start marker of the specified paintColor on the last node (in preceding-sibling axis) that was painted.

Note

This function, in contrast to e.g. set-painter(), immediately paints the respective nodes, which means the paint color can be queried by the is-painted() function.

The function returns the number of nodes actually painted (which may be smaller than siblingCount when there were less sibling nodes).

For details, see the section on Painters in the upCast Manual.

Example 5.34. 

paint-adjacent( address, 2 );

will paint the context node and its following sibling with color address immediately.

paint-adjacent( address, -2 );

will paint the context node and its preceding sibling with color address immediately.


5.7. paint-following

paint-following(paintColor as Id, condition as BoolExpression, endMode as Id) as Numeric

paintColorIdthe paint id
conditionBoolExpressionboolean expression to evaluate with the actual context later
endModeIdnone | start | end

This function paints all contiguously following siblings of the context node with the specified paintColor for which condition matches or until the end of the following-sibling axis of the context node is reached.

endMode can have the following values:

NONE

no special handling of the last painted node

START

the last node painted will get set a start marker of the specified paintColor

END

the last node painted will get set an end marker of the specified paintColor

The method returns the number of nodes that were painted.

Note

This function, in contrast to e.g. set-painter(), immediately paints the respective nodes, which means the paint color can be queried by the is-painted() function.

For details, see the section on Painters in the upCast Manual.

Example 5.35. 

paint-following( address, @css:font-family="Times", END );

will paint the all contiguously following sibling nodes of the context node whose font-family is "Times". Additionally, the last painted node will get an end marker set as if by calling mark-end( address ) on it.


5.8. paint-preceding

paint-preceding(paintColor as Id, condition as BoolExpression, endMode as Id) as Numeric

paintColorIdthe paint id
conditionBoolExpressionboolean expression to evaluate with the actual context later
endModeIdnone | start | end

This function paints all contiguously preceding siblings of the context node with the specified paintColor for which condition matches or until the end of the preceding-sibling axis of the context node is reached.

endMode can have the following values:

NONE

no special handling of the last painted node

START

the last node painted will get set a start marker of the specified paintColor

END

the last node painted will get set an end marker of the specified paintColor

The method returns the number of nodes that were painted.

Note

This function, in contrast to e.g. set-painter(), immediately paints the respective nodes, which means the paint color can be queried by the is-painted() function.

For details, see the section on Painters in the upCast Manual.

Example 5.36. 

paint-preceding( address, @uci:class="addr", START );

will paint all contiguously preceding sibling nodes of the context node whose class is "addr". Additionally, the last painted node will get a start marker set as if by calling mark-start( address ) on it.


5.9. set-paint-attr

set-paint-attr(paintColor as Id, attrQName as Id, attrValue as String) as Void

paintColorIdthe paint id
attrQNameIdqualified attribute name
attrValueStringvalue to set

This method lets you set an attribute (with qualified name attrQName) with associated value attrValue on the context node that will be promoted to the immediate grouping parent of the context node during grouping (if that exists) for the specified paintColor.

During grouping of a sequence of nodes, the attrValue (among all attributes of the same attrQName) of the node that last (in document order) specifies a value for it is used for the value of that attribute on the grouping element.

Paint attributes, though set on the grouped nodes, will not be serialized on the individual nodes, only on the created grouping element.

Example 5.37. 

set-paint-attr( address, uci:language, @xml:lang );

will create an additional grouping attribute on a final address grouping dependent on the last set value for uci:language in a group, e.g.

<uci:block uci:type="address" uci:language="en">
   ...
</uci:block>

5.10. set-paint-attr

set-paint-attr(target as String, paintColor as Id, attrQName as Id, attrValue as String) as Numeric

targetStringXPath 1.0 expression selecting node(s) to set paint attribute on
paintColorIdthe paint id
attrQNameIdqualified attribute name
attrValueStringvalue to set

This method lets you set an attribute (with qualified name attrQName) with associated value attrValue on all nodes selected by the target XPath 1.0 expression (evaluated relative from the context node). That attribute will be promoted to the immediate grouping parent of the target node during grouping (if that exists) for the specified paintColor.

The function returns the number of nodes the target expression selected.

During grouping of a sequence of nodes, the attrValue (among all attributes of the same attrQName) of the node that last (in document order) specifies a value for it is used for the value of that attribute on the grouping element.

Paint attributes, though set on the grouped nodes, will not be serialized on the individual nodes, only on the created grouping element.

5.11. set-paint-value

set-paint-value(target as String, paintColor as Id, name as Id, value as Value) as Numeric

targetStringXPath 1.0 expression selecting node(s) to set paint value on
paintColorIdthe paint id
nameIdvalue key
valueValuevalue to set

This method lets you set an UPL value (with name name) with associated value value on all nodes selected by the target XPath 1.0 expression (evaluated relative from the context node). That UPL value will be promoted to the immediate grouping parent of the target node during grouping (if that exists) for the specified paintColor.

The function returns the number of nodes the target expression selected.

During grouping of a sequence of nodes, the value (among all values of the same name) of the node that last (in document order) specifies a value for it is used for the value of that UPL value on the grouping element.

Paint values are never serialized. They can be queried in subsequent UPL module runs on the grouping elements that have been created in the internal document tree using the get-value() function.

5.12. set-paint-value

set-paint-value(paintColor as Id, name as Id, value as Value) as Void

paintColorIdthe paint id
nameIdvalue key
valueValuevalue to set

This method lets you set an UPL value (with name name) with associated value value on the context node that will be promoted to the immediate grouping parent of the context node during grouping (if that exists) for the specified paintColor.

During grouping of a sequence of nodes, the value (among all values of the same name) of the node that last (in document order) specifies a value for it is used for the value of that UPL value on the grouping element.

Paint values are never serialized. They can be queried in subsequent UPL module runs on the grouping elements that have been created in the internal document tree using the get-value() function.

Example 5.38. 

set-paint-value( address, cdata, string() );

will set an UPL value with the CDATA contents of the last grouped node on the grouping element for paintColor address, using the key cdata for later retrieval.


5.13. set-painter

set-painter(paintColor as Id, painterTypes as List) as Void

paintColorIdthe paint id
painterTypesListlist of painter types (as String)

This function lets you set a painter of specified paintColor and painterTypes on the context node.

paintColor is the value which will be used as the type attribute value of the grouping element created by a subsequent Grouper module. When you set a qualified name, the resulting value will be its expanded name.

painterTypes is an ordered (from start to end) list of painter types and fallback painter types to be used if a painter fails.

This method does not actually paint any nodes, unless painterTypes also contains the type "this", in which case the context node is immediately painted.

For details, see the section on Painters in the upCast Manual.

Example 5.39. 

set-painter( address, { "start-end", "this" } );

will place a painter of color address on the context node, with a preferred type of "start-end" and a fallback type of "this".


5.14. set-painter

set-painter(target as String, paintColor as Id, painterTypes as List) as Numeric

targetStringXPath selecting the nodes to set the painter(s) on
paintColorIdthe paint id
painterTypesListlist of painter types (as String)

This function lets you set a painter of specified paintColor and painterTypes on all nodes selected by the target XPath 1.0 expression (evaluated relative from the context node).

The function returns the number of nodes the target expression selected.

paintColor is the value which will be used as the type attribute value of the grouping element created by a subsequent Grouper module. When you set a qualified name, the resulting value will be its expanded name.

painterTypes is an ordered (from start to end) list of painter types and fallback painter types to be used if a painter fails.

This method does not actually paint any nodes, unless painterTypes also contains the type "this", in which case the respective node is immediately painted.

For details, see the section on Painters in the upCast Manual.

Example 5.40. 

set-painter( "//uci:par[@uci:class='Address']", address, { "start-end", "this" } );

will place a painter of color address on paragraphs that have an assigned paragraph style of name "Address", with a preferred type of "start-end" and a fallback type of "this".


6. Graphical UI functions

6.1. open-url-in-browser

open-url-in-browser(url as String) as Bool

urlString

Opens the passed url in the system's default browser.

6.2. set-progress

set-progress(curval as Numeric, maxval as Numeric) as Void

curvalNumericcurrent value
maxvalNumericmax value

Lets you set progress information for the currently running (UPL-) module. The current state is defined as the current progress value curval compared to the maximum progress value maxval.

Example 5.41. 

set-progress( 4.0, 8.0 );

sets the progress bar to 50% of the currently running task's full duration.


6.3. set-ui-text

set-ui-text(elementId as Id, labeltext as String) as Void

elementIdIdprogress-label | progress-sublabel
labeltextStringthe text to show

This method lets you set the text in various components of upCast's user interface. The text is updated immediately in the UI, so you can e.g. provide the user with more detailed progress information while a lengthy UPL code sequence is taking place.

You specify the element for which you want to set the text using a symbolic constant in elementId, and the text is passed in the labeltext parameter.

The following symbolic constants are available:

PROGRESS-LABEL

sets the main label of the progress bar in upCast's pipeline window (at the lower left)

PROGRESS-SUBLABEL

sets the sub-label of the progress bar in upCast's pipeline window (at the lower right) Note that this label may overlap the PROGRESS-LABEL when both labels' texts are sufficiently long.

Example 5.42. 

The code

function initialize() as Numeric {
  set-ui-text( PROGRESS-SUBLABEL, "Initializing UPL...");
}

will display the text "Initializing UPL..." in the progress bar's sub-label when the UPL module starts to execute.


6.4. show-dialog

show-dialog(dialogType as Id, windowTitle as String, dialogText as String, buttonDescription as String) as Numeric

dialogTypeIdPLAIN | INFO | QUESTION | WARNING | ERROR
windowTitleStringwindow title to display
dialogTextStringdialog text
buttonDescriptionStringbutton description format string

This function displays a customizable dialog with up to three buttons.

The dialogType determines the overall style of the dialog:

PLAIN

a plain dialog

INFO

a dialog with an info-type icon

QUESTION

a dialog that asks the user a question, usually with a question mark icon

WARNING

a dialog that warns the user, usually with an exclamation mark icon

ERROR

a dialog that informa the user of an error, usually with a stop-sign icon

You can specify the dialog window's title in the parameter windowTitle.

The body text of the dialog is passed in the dialogText parameter.

Finally, the buttonDescription parameter takes a specially formatted string specifying the button(s) you want to be displayed, and which one should be the default button. The syntax is as follows:

buttonDescription ::= button ('|' button){0,2}
           button ::= [*]?[^|]+

In other words, button text specifications are separated by the pipe character '|', and you specify the default button by prefixing it with an asterisk '*'. The maximum number of buttons is 3. Note that buttons are specified in a virtual OK, Cancel, Alternative order, which means that depending on the OS you are running on, the displayed order of the buttons may vary from the specification order.

The function returns the number of the button clicked, with closing the dialog by its window decoration (instead of one of its buttons) returns the value 1.

The show-dialog() function will only work when running in a GUI environment. When running in commandline or API mode where there is no GUI available, the function immediately returns the value (-1).

Example 5.43. 

show-dialog( WARNING, "Warning Dialog", "This is a warning dialog.", "OK|*Cancel|Abort");

will display the following dialog when running in GUI mode:

resulting dialog


7. Functions on Lists

7.1. append

append(list as List, element as Value) as List

listListlist to append a value to
elementValuevalue to append

Appends the value in element to the List list so that it becomes the new last element.

The function returns the value passed in list, which is the List modified as described.

7.2. append-all

append-all(list as List, appendList as List) as List

listListlist to append to
appendListListlist of values to append (in the given order)

Appends all elements in appendList to the List list so that they become the new last elements. The elements are appended in the order they are stored in appendList.

The function returns the value passed in list, which is the List modified as described.

7.3. count

count(list as List) as Numeric

listLista list to count the number fo elements of

Returns the number of elements in list.

7.4. distinct-values

distinct-values(list as List) as List

listList

Returns list with duplicate values removed.

The de-duplicated list contains the single values in the order of their first occurrence in the original list.

Example 5.44. 

distinct-values( { 2.0in, 1cm, "1cm", 10mm, 5.08cm } )

will return

{ 2in, 1cm, "1cm" }

(semantically; note that lengths are normalized to twips internally)


7.5. filter-list

filter-list(filterType as Id, source as List, param as List) as List

filterTypeIdEXCLUDE | KEEP
sourceList
paramList

Returns a List containing every element from source that satisfies the specified filterType with parameter param.

List elements are returned in the same order as they were in source. Duplicates are not removed.

The following filterTypes are currently implemented:

EXCLUDE

excludes (=removes) all those elements from source that are equal to some element in param

KEEP

keeps only those elements from source that are equal to some element in param

Example 5.45. EXCLUDE example

filter-list( EXCLUDE, { "a", "car", "is", "a", "vehicle" }, {"is", "a", "the"} )

will return

{ "car", "vehicle" }

Note that duplicates are not removed from the result, so

filter-list( EXCLUDE { "a", "car", "is", "a", "car", "is", "a", "vehicle" }, {"is", "a"} )

will return

{"car","car","vehicle"}

Example 5.46. KEEP example

filter-list( KEEP, { "a","car","is","a","car","is","a","vehicle" }, {"is","a"} )

will return

{"a","is","a","is","a"}

7.6. flatten

flatten(source as List) as List

sourceLista list of arbitrary values

Returns a List where all items in source of type List have been (recursively) flattened.

A List that has one or more Lists as its members is considered flattened when all its items of type List have been replaced by the individual elements in order of that List, possibly recursively. The result is a List that does no longer contain any elements of type List. This is best shown with an

Example 5.47. 

flatten( { a, { b1, b2 }, c } )

returns { a, b1, b2, c }

flatten( { a, { b1, { b2i, b2ii }, b3 }, c, { d1, d2 } } )

returns { a, b1, b2i, b2ii, b3, c, d1, d2 }

flatten( { a, b, c } )

returns { a, b, c }, that is the list is returned unchanged because there aren't any sub-lists.


7.7. index-of

index-of(list as List, value as Value) as Numeric

listLista list
valueValuevalue to search for in the list

Returns the index within list of the first occurrence of value. If value is not a member of list, -1 is returned.

7.8. is-in

is-in(refValue as Value, valueList as List) as Bool

refValueValueValue to look for
valueListListList of values

Determines if the List list contains an element that, after having been cast to a String as if applying the to-string() function, is equal to the supplied refValue after it has been cast to a String as if applying the to-string() function.

7.9. remove

remove(list as List, index as Numeric) as List

listLista list
indexNumericindex of element to remove from the list (1-based)

Removes the element with index index from the List list.

The function returns the value passed in list, which is the List modified as described.

7.10. set-value-at

set-value-at(list as List, pos as Numeric, val as Value) as List

listLista list
posNumericvalue to set
valValueindex of element to set (1-based)

Sets the list element at index pos to the value val.

When the list currently does not have an element at index pos, it is increased in size automatically until that index exists, with any newly added elements set to null.

The function returns list with the modification applied.

Example 5.48. 

set-value-at( { a, b, y }, 3, z )

returns the list { a, b, z }


7.11. sort

sort(list as List, order as Id) as List

listList
orderIdASCENDING | DESCENDING

This function returns a sorted copy of the passed list.

The sorting order can be specified by the order parameter:

ASCENDING

- the list's values are sorted in ascending order (from lowest to highest)

DESCENDING

- the list's values are sorted in descending order (from highest to lowest)

Example 5.49. 

sort( {3,7,5,2,8}, ASCENDING )

will return the new list

{2,3,5,7,8}

and

sort( {"Eins","zwei",drei,4}, DESCENDING )

will return the new list

{"zwei","Eins",drei,4}

7.12. value-at

value-at(list as List, index as Numeric) as Value

listLista list
indexNumericelement index (1-based)

Extracts the value at position index of the passed list. The index is 1-based, i.e. the first element has index 1, the second has index 2 etc.

If the requested index position does not exist an EvalException is thrown.

8. Logging functions

8.1. clear-log-messages

clear-log-messages(logRealm as Id) as Void

logRealmIdPIPELINE | MODULE | customlogger

This functions clears the internally collected log messages. With the logRealm parameter, you decide which log event collector to clear. This can be one of:

Possible values are:

PIPELINE

all log events for the currently executing pipeline (this also includes the log events created by the currently running module, accessible separately by the logger MODULE, and all modules run earlier in the pipeline)

MODULE

all log events created during execution of the currently running module

<name>

log events created by the custom logger named name (see also start-logger())

This function can be useful if you want to periodically clear collected log messages you no longer need in long-running or looping pipelines to prevent excessive or indefinitely increasing memory usage.

8.2. create-log-writer (deprecated) 

create-log-writer(context as Id, filter as String, filename as String, mode as Id) as Value

This function is deprecated and should no longer be used. It will be removed in a subsequent release.

For a replacement, use create-log-writer().

8.3. create-log-writer

create-log-writer(context as Id, filter as String, filename as String, mode as Id, type as Id) as Value

contextIdMODULE | PIPELINE | APPLICATION
filterStringlog filter string
filenameStringlog file name
modeIdWRITE | APPEND
typeIdTEXT | XML

This function creates a custom logfile writer instance. It is tied to the currently executing pipeline thread, i.e. this logger is independent from other pipeline instances running at the same time (even when they were created from the same pipeline document).

The context parameter specifies to which execution context the logger is bound. The context can be thought as the running instance filter for log messages created in the upCast application:

MODULE

The context is the module in which the logger was created. When the module exits, the logger is automatically discarded. Only log messages created in the context of the execution of this module instance will be passed to that logger's message filter.

PIPELINE

The context is the pipeline in which the logger was created. A logger created in a module has its parent pipeline as context pipeline. When the pipeline exits, the logger is automatically discarded. Only log messages created in the context of the execution of this pipeline instance will be passed to that logger's message filter.

APPLICATION

The context is the complete upCast application in which the logger was created. Note that only when the whole application exits (which normally is equivalent to exiting the running Java VM), the logger is automatically discarded. This usually means that loggers created with APPLICATION context will exist forever within the execution lifetime of the application, and you must create them with extreme care and/or discard them explicitly at appropriate places (using discard-log-writer()).

The set of log messages to be written to the log writer is constructed by applying the specified log message filter expression filter to the set of available log messages. The syntax for filter is defined here.

filename is the absolute path to the file to write the log messages to. There are two special values for this parameter which directy output to the special, system files stdout and stderr (which will appear e.g. in the host system's console window):

upcast:stdout

writes the output of the log writer to the standard output stream (stdout)

upcast:stderr

writes the output of the log writer to the standard error stream (stderr)

mode specifies whether log messages should be appended to the contents of filename (if that already exists), or if the file should be cleared at logger creation time:

APPEND

append log messages to the file filename (if it already exists) for this instance

WRITE

clear a possibly existing file filename before writing log messages to it for this instance

The mode parameter is irrelevant for output to stdout and stderr (upcast:stdout and upcast:stderr special destinations), for which always APPEND is assumed.

type specifies the format of the file written. Currently implemented types are:

TEXT

the log messages are serialized as UTF-8 encoded text to a plain text file.

XML

the log messages are serialized to an XML file. Note that the created XML file is only parseable after the logger has been discarded either automatically by exiting the scope for which it was defined, or by calling discard-log-writer() explicitly. This is because as a finalizing action before closing the file, the closing XML tag needs to be written to make the XML file well-formed.

The function returns a unique handle (log writer id of type Id) identifying this log writer instance. You must use this handle when you want (or in the case of the APPLICATION context, need) to explicitly discard (=close and detach) a log writer from the central application logging hub and cannot or do not want to rely on the automatic cleanup.

The function returns null (of type Null) when the log file or log writer could not be created.

Technically, this function appends a logfile writer object to the application's central logging hub (the one which the application log file and the live log window also attach to), with the deifference that these loggers implicitly perform a context-based pre-filtering of the received log messages by the instance (not: type) of the component (module or pipeline) that created the logger. Only then, the usual log message filtering is applied.

Note

Such a mechanism is useful to perform highly granular (i.e. at module level) logging with varying levels (log message filters) for debugging purposes.

Sometimes, you'll need DETAIL-level debugging only for a certain component in your pipeline and setting this level for the whole pipeline would be impractical due to the large amount of messages created. With a file logger created with this function, you can restrict DETAIL logging output to a single module instance, and you have that info available in a persistent file for later debugging.

Where to create log writers?

To catch as much of the log output of a module as possible, you'll need to create a log writer as early as possible in a module.

The recommended place for this is right at the top of the custom initialization function of a module or pipeline.

Example 5.50. 

$pipeline:logWriter := create-log-writer( MODULE, "ALL -WARN -INFO", $pipeline:PipelineBase + "mylog.txt", APPEND);

creates a log writer appending log messages of the current module to the file mylog.txt, which resides in the same folder as the pipeline document containing the module.

All log messages are written to the file except for those with level WARN and INFO.


8.4. discard-log-writer

discard-log-writer(name as Id) as Bool

nameIdlog writer id

This function explicitly discards a log writer created earlier with create-log-writer(). It implicitly closes the output stream for the file this log writer writes to. You need to pass the handle you were returned for the writer on creation time.

The function returns true when the log writer existed and could be discarded, false otherwise.

8.5. forward-log-message

forward-log-message(level as Id, messagecode as Numeric, logmessage as Value, ...) as String

levelIdTRACE | DETAIL | VERBOSE | DEBUG | INFO | WARN | ERROR | FATAL
messagecodeNumericnumeric message code to use (≥0)
logmessageValuelog message value(s)

This method lets you create a new custom log message and place it into the logical parent component (module or pipeline) of the current component (module or pipeline).

If the logical parent is the application (in other words: if called from the top-level pipeline component), this method does nothing.

In level, you specify the log message level you want to set for that message. This can be any of the following levels: FATAL, ERROR, WARN, INFO, DEBUG, VERBOSE, DETAIL.

messagecode lets you specify a custom message code. You can use this to find your own specific messages in a logger later using the get-log-messages() function by specifying the respective codes in its includeCodes list.

Important

Your custom message codes must be greater than 0. Any negative codes are reserved by upCast for its own error message constants.

See also: de.infinityloop.msg.Msg

Finally, you can add an arbitrary list of Value objects to be output as the logmessage.

A unique identifier for the generated log message in the form mid<integer>.

Example 5.51. 

forward-log-message( WARN, 5, "The value ", $number, " is not equal to 5." );

will create a log message with level WARN and message code 5 in the logical parent with the concatenated string representations of the remaining Value objects in the specified order.


8.6. forward-log-messages

forward-log-messages(levels as List, includeCodes as List, excludeCodes as List) as Numeric

levelsListTRACE | DETAIL | VERBOSE | DEBUG | INFO | WARN | ERROR | FATAL
includeCodesListlist of message codes (each one either an Id or a Numeric) to include
excludeCodesListlist of message codes (each one either an Id or a Numeric) to exclude

This method forwards (meaning: copies into) the specified set of log messages collected in the current component (which might be a module or pipeline) to its logical parent (which again might be a module or pipeline).

If the logical parent is the application (in other words: if called from the top-level pipeline component), this method does nothing.

With levels, you specify the list of log message levels you are interested in. This can be any of the following levels: FATAL, ERROR, WARN, INFO, DEBUG, VERBOSE, DETAIL. If the list is the empty list, all types are considered.

Each log message has a numerical code, which is defined in the class de.infinityloop.msg.Msg. With includeCodes, you can specify the list of numerical codes or symbolic names (ids) that should be included in the result. With excludeCodes, you can specify the list of numerical codes or symbolic names (ids) that should be excluded from the result. In both cases, when the list is the empty list, no respective filtering is applied.

The algorithm (=order) in which the described filters are applied is as follows:

  1. From the set of available log messages, only those are considered that match any of the levels in levels. When levels is the empty list, no filtering is applied.

  2. Adding to the subset created in step 1, those log messages are included whose numerical message code matches any of the codes listed in includeCodes. When includeCodes is the empty list, no filtering is applied.

  3. From the subset created in step 2, all those log messages are removed whose numerical message code matches any of the codes listed in excludeCodes. When excludeCodes is the empty list, no filtering is applied.

The function returns the number of log messages that matched the selection criteria and were therefore forwarded to this component's logical parent component.

Example 5.52. 

The call

forward-log-messages( {ERROR, WARN}, {}, { UserCatalogSettingError, DTDUpdateError});

forwards all log messages that are of type ERROR or WARN with the exception of messages whose symbolic name is UserCatalogSettingError and DTDUpdateError.


8.7. forward-log-messages

forward-log-messages(filterexp as String) as Numeric

filterexpStringlog filter expression

This method forwards (meaning: copies into) the specified set of log messages collected in the current component (which might be a module or pipeline) to its logical parent (which again might be a module or pipeline).

If the logical parent is the application (in other words: if called from the top-level pipeline component), this method does nothing.

The set of log messages to be forwarded is constructed by applying the specified message filter expression filterexp to the set of available log messages. The syntax for filterexp is defined here.

The function returns the number of log messages that matched the selection criteria and were therefore forwarded to this component's logical parent component.

Example 5.53. 

The call

forward-log-messages( "+ERROR +WARN -UserCatalogSettingError -DTDUpdateError" );

forwards all log messages that are of type ERROR or WARN with the exception of messages whose symbolic name is UserCatalogSettingError and DTDUpdateError.


8.8. get-log-message-component

get-log-message-component(logMessage as List, component as Id) as Value

logMessageLista single log message represented as a list of components as delivered by get-log-messages()
componentIdTIMESTAMP | LEVEL | CODE | CONTEXT | MESSAGE | FULLMESSAGE

This method can extract a certain component of a logMessage's opaque structure as returned by the get-log-messages() or get-log-messages() functions.

The component to extract can be one of:

TIMESTAMP the timestamp of the log message as String in format "dd MMM yy HH:mm:ss.SSS" using locale US.

LEVEL the level of the log message as Id. One of: FATAL, ERROR, WARN, INFO, DEBUG, VERBOSE, DETAIL, TRACE

CODE the numeric message code as Numeric

CONTEXT the context information of where the message originates as String

MESSAGE the actual, raw, textual message of the log message as String

FULLMESSAGE the full log message as formatted by upCast and as it is written to log files. This includes all previously described components and is returned as String.

ID a unique String value identifying this particular log message. It has the format mid<integer>. This is is also available in log files created by a log writer outputting in XML format.

Example 5.54. 

To extract the raw message text of the first collected log message of the currently executing module, use:

get-log-message-component( value-at( get-log-messages( MODULE, "ALL" ), 1 ), MESSAGE )

8.9. get-log-messages

get-log-messages(name as Id, levels as List, includeCodes as List, excludeCodes as List) as List

nameIdPIPELINE | MODULE | any named logger created with start-logger()
levelsListFATAL | ERROR | WARN | INFO | DEBUG | VERBOSE | DETAIL | TRACE
includeCodesListlist of message codes (each one either an Id or a Numeric) to include
excludeCodesListlist of message codes (each one either an Id or a Numeric) to exclude

This method lets you retrieve a (possibly) filtered list of log messages for the currently running module, the whole pipeline or a custom logger created with start-logger(). The result is a list of two-element lists as described below.

The name parameter specifies the logger from which to retrieve the log messages. Possible values are:

PIPELINE

all log events for the currently executing pipeline (this also includes the log events created by the currently running module, accessible separately by the logger module, and all modules run earlier in the pipeline)

MODULE

all log events created during execution of the currently running module

<name>

log events created by the custom logger named name (see also start-logger())

With levels, you specify the list of log message levels you are interested in. This can be any of the following levels: FATAL, ERROR, WARN, INFO, DEBUG, VERBOSE, DETAIL. If the list is the empty list, all types are considered.

Each log message has a numerical code, which is defined in the class de.infinityloop.msg.Msg. With includeCodes, you can specify the list of numerical codes or symbolic names (ids) that should be included in the result. With excludeCodes, you can specify the list of numerical codes or symbolic names (ids) that should be excluded from the result. In both cases, when the list is the empty list, no respective filtering is applied.

The algorithm (=order) in which the described filters are applied is as follows:

  1. From the set of all log messages of the logger designated by the name parameter, only those are considered that match any of the levels in levels. When levels is the empty list, no filtering is applied.

  2. Adding to the subset created in step 1, those log messages are included whose numerical message code matches any of the codes listed in includeCodes. When includeCodes is the empty list, no filtering is applied.

  3. From the subset created in step 2, all those log messages are removed whose numerical message code matches any of the codes listed in excludeCodes. When excludeCodes is the empty list, no filtering is applied.

The resulting set of log messages is the final result set of size N and used for building the result of the function, which is a list of opaque multi-element lists.

{
  { ...private list for message 1... }
  { ...private list for message 2... }
    ...
  { ...private list for message N.... }
}

To extract data from a certain instance of such an opaque multi-element list, use get-log-message-component() .

Example 5.55. 

The call

get-log-messages( MODULE, {ERROR, WARN}, {}, { -12, -14});

creates its result from all log messages created up to the call in the currently running module by selecting any ERROR or WARN messages whose error codes are not -12 or -14.

The call

get-log-messages( PIPELINE, {WARN}, {-2, -3, -5}, {});

creates its result from all log messages created up to the call in the currently executing pipeline by selecting any WARN messages whose error codes are either -2, -3 or -5.

The call

get-log-messages( sub, {}, {}, {});

creates its result from all log messages created up to the call in the in-scope custom logger named sub started using an earlier call to start-logger() like start-logger( sub, DEBUG ).


8.10. get-log-messages

get-log-messages(name as Id, filterexp as String) as List

nameIdPIPELINE | MODULE | any named logger created with start-logger()
filterexpStringlog filter expression

This method lets you retrieve a (possibly) filtered list of log messages for the currently running module, the whole pipeline or a custom logger created with start-logger(). The result is a list of two-element lists as described below.

The name parameter specifies the logger from which to retrieve the log messages. Possible values are:

PIPELINE

all log events for the currently executing pipeline (this also includes the log events created by the currently running module, accessible separately by the logger module, and all modules run earlier in the pipeline)

MODULE

all log events created during execution of the currently running module

<name>

log events created by the custom logger named name (see also start-logger())

The set of log messages to be retrieved is constructed by applying the specified log message filter expression filterexp to the set of available log messages. The syntax for filterexp is defined here.

The resulting set of log messages is the final result set of size N and used for building the result of the function, which is a list of opaque multi-element lists.

{
  { ...private list for message 1... }
  { ...private list for message 2... }
    ...
  { ...private list for message N.... }
}

To extract data from a certain instance of such an opaque multi-element list, use get-log-message-component() .

Example 5.56. 

The call

get-log-messages( MODULE, "+ERROR +WARN --12 --14" );

creates its result from all log messages created up to the call in the currently running module by selecting any ERROR or WARN messages whose error codes are not -12 or -14.

The call

get-log-messages( PIPELINE, "+WARN +-2 +-3 +-5" );

creates its result from all log messages created up to the call in the currently executing pipeline by selecting all WARN messages and any messages whose error codes are either -2, -3 or -5.

The call

get-log-messages( sub, "ALL" );

creates its result from all log messages created up to the call in the in-scope custom logger named sub started using an earlier call to start-logger() like start-logger( sub, DEBUG ).


8.11. log

log(level as Id, message as Value, ...) as String

levelIdTRACE | DETAIL | VERBOSE | DEBUG | INFO | WARN | ERROR | FATAL
messageValuethe log message values which will be concatenated to a string

Outputs the specified message via upCast's logging system. The type of log entry to be generated can be set using the level parameter, which can take the values DETAIL, VERBOSE, DEBUG, INFO, WARN, ERROR or FATAL.

A unique identifier for the generated log message in the form mid<integer>.

8.12. log-custom

log-custom(name as Id, level as Id, messagecode as Numeric, logmessage as Value, ...) as String

nameIdPIPELINE | MODULE | customlogger
levelIdTRACE | DETAIL | VERBOSE | DEBUG | INFO | WARN | ERROR | FATAL
messagecodeNumericthe message code (≥0)
logmessageValuethe log message value(s)

This method lets you add your own, attributed log messages to a pre-defined or custom logger.

name designates the logger to add the message to. Possible values are:

PIPELINE

all log events for the currently executing pipeline (this also includes the log events created by the currently running module, accessible separately by the logger module, and all modules run earlier in the pipeline)

MODULE

all log events created during execution of the currently running module

<name>

log events created by the custom logger named name (see also start-logger())

In level, you specify the log message level you want to set for that message. This can be any of the following levels: FATAL, ERROR, WARN, INFO, DEBUG, VERBOSE, DETAIL.

messagecode lets you specify a custom message code. You can use this to find your own specific messages in a logger later using the get-log-messages() function by specifying the respective codes in its includeCodes list.

Important

Your custom message codes must be greater than 0. Any negative codes are reserved by upCast for its own error message constants.

See also: de.infinityloop.msg.Msg

Finally, you can add an arbitrary list of Value objects to be output as the logmessage.

A unique identifier for the generated log message in the form mid<integer>.

Example 5.57. 

log-custom( sub, WARN, 5, "The value ", $number, " is not equal to 5." );

will create a log message with level WARN and message code 5 in the custom logger sub with the concatenated string representations of the remaining Value objects in the specified order.


8.13. start-logger

start-logger(name as Id, level as Id) as Void

nameIdan arbitrary name different from PIPELINE and MODULE
levelIdTRACE | DETAIL | VERBOSE | DEBUG | INFO | WARN | ERROR | FATAL

This method starts a custom logger with name name at the current code nesting level, accepting log messages of the specified level and higher-level messages.

If the named custom logger does already exist, it is not cleared, but newly generated log messages are appended if they satisfy the – possibly changed in comparison to the one specified at its creation – level condition. You can also use this to resume collecting log events in this logger after a call to stop-logger().

The following logger names are not allowed (in any combination of uppercase and lowercase characters), since they designate loggers pre-defined by upCast: PIPELINE, MODULE

Important

The custom logger is only valid at the nesting level it was first created (and all deeper levels). It is automatically disposed of when the UPL program execution flow leaves the defining block.

8.14. stop-logger

stop-logger(name as Id) as Void

nameIdan arbitrary name different from PIPELINE and MODULE

This method stops adding log messages to the custom logger of name name name. Logging to this logger can be resumed by calling start-logger() again.

9. Boolean logic functions

9.1. bitwise-and

bitwise-and(n1 as Numeric, n2 as Numeric) as Numeric

n1Numeric
n2Numeric

This function returns the value obtained by performing a bitwise-and between n1 and n2.

For this, n1 and n2 are converted to integer numbers by truncating any decimals, the bitwise-and operation is performed, and the resulting integer number is returned (interpreted as a signed, 4-byte value).

Both operands must be of power 0 (i.e. pure numbers).

Example 5.58. 

bitwise-and( 1, 3 )
bitwise-and( 1, 3.1415 )
bitwise-and( 1.999, 3)

all will return 1.


The function will throw a TypeConversionException when not all operands are of power 0.

9.2. bitwise-not

bitwise-not(n as Numeric) as Numeric

nNumeric

This function returns the value obtained by performing a bitwise-not on n.

For this, n is converted to an integer number by truncating any decimals, the bitwise-not operation is performed, and the resulting integer number is returned (interpreted as a signed, 4-byte value).

The operand must be of power 0 (i.e. pure numbers).

Example 5.59. 

bitwise-not( 255 )
bitwise-xor( 255.001 )
bitwise-xor( 255.999 )

all will return -256.


The function will throw a TypeConversionException when n is not of power 0.

9.3. bitwise-or

bitwise-or(n1 as Numeric, n2 as Numeric) as Numeric

n1Numeric
n2Numeric

This function returns the value obtained by performing a bitwise-or between n1 and n2.

For this, n1 and n2 are converted to integer numbers by truncating any decimals, the bitwise-or operation is performed, and the resulting integer number is returned (interpreted as a signed, 4-byte value).

Both operands must be of power 0 (i.e. pure numbers).

Example 5.60. 

bitwise-or( 1, 2 )
bitwise-or( 1, 2.1415 )
bitwise-or( 1.999, 2)

all will return 3.


The function will throw a TypeConversionException when not all operands are of power 0.

9.4. bitwise-xor

bitwise-xor(n1 as Numeric, n2 as Numeric) as Numeric

n1Numeric
n2Numeric

This function returns the value obtained by performing a bitwise-xor between n1 and n2.

For this, n1 and n2 are converted to integer numbers by truncating any decimals, the bitwise-xor operation is performed, and the resulting integer number is returned (interpreted as a signed, 4-byte value).

Both operands must be of power 0 (i.e. pure numbers).

Example 5.61. 

bitwise-xor( 1, 3 )
bitwise-xor( 1, 3.1415 )
bitwise-xor( 1.999, 3)

all will return 2.


The function will throw a TypeConversionException when not all operands are of power 0.

9.5. exists

exists(value as Value) as Bool

valueValuevalue to check for not being null

Returns true when the passed value is not Null.

9.6. exists-var

exists-var(varname as Id) as Bool

varnameIdqualified variable name

Returns true when an in-scope UPL variable with the same name as the Id passed in varname exists.

If (and only if) the namespace of the Id is one of the variable realm namespaces, this returns true when a realm variable of the same name as varname exists. When this function returns true, $varname and get-var( varname ) will not fail, although the returned value could still be of type Null when the respective realm variable's value is null.

See: get-var()

9.7. false

false() as Bool

Returns a Bool with value false.

9.8. is-null

is-null(value as Value) as Bool

valueValuevalue to check against null

Returns true when the passed value is Null.

9.9. not

not(value as Bool) as Bool

valueBoolvalue to negate

Returns the negated value of the passed Bool value.

9.10. true

true() as Bool

Returns a Bool with value true.

10. Functions on DOM nodes

10.1. append-attr

append-attr(attname as Id, value as Value) as Void

attnameIdthe attribute name
valueValuethe value to append

Appends the value value to the current value of the attribute attname on the context node by first appending a space character U+0020, then the passed value. The value is converted to a String before being appended, as if by using the to-string() function on it.

If the attribute does not yet exist, the attribute attname is first created and then assigned the value of value.

The function returns true when the operation succeeded, false otherwise. It throws an exception when trying to set an attribute on a non-element node, or if the respective attribute is not allowed to be set on the context element (e.g. uci:class on an element other than uci:inline or uci:par).

10.2. append-attr

append-attr(target as String, attname as Id, value as Value) as Void

targetStringxpath to element
attnameIdthe attribute name
valueValuethe value to append

Appends the value value to the current value of the attribute attname on all element nodes selected by the target XPath 1.0 expression (evaluated relative from the context node) by first appending a space character U+0020, then the passed value. The value is converted to a String before being appended, as if by using the to-string() function.

If the attribute does not yet exist, the attribute attname is first created and then assigned the value of value.

The function returns true when the target expression selects at least one target element and the operation succeeded, false otherwise. It throws an exception when trying to set an attribute on a non-element node, or if the respective attribute is not allowed to be set on the context element (e.g. uci:class on an element other than uci:inline or uci:par).

10.3. attach-value

attach-value(key as Id, value as Value) as Void

keyIdvalue key
valueValuevalue to attach to the node

This method lets you attach an UPL value (with name key) and associated value value to the context node. This value can be queried using the get-value() function.

The function returns always true.

Note

The difference to setting an attribute on the context node is that a value can also be attached to nodes that do not support attributes (like Text nodes, PI nodes or Comment nodes) and that the type information of the value is retained.

However, attached values are never serialized to XML.

10.4. attach-value

attach-value(target as String, ..., key as Id, value as Value) as Void

targetStringXPath 1.0 expression selecting node(s) to attach value to
keyIdvalue key
valueValuevalue to attach

This method lets you attach an UPL value (with name key) and associated value value to all nodes selected by the target XPath 1.0 expression (evaluated relative from the context node). This value can later be queried using the get-value() function.

The function returns true when the target expression selects at least one target node, false otherwise.

Example 5.62. 

#namespace uci "http://www.infinity-loop.de/namespace/2006/upcast-internal";

[element(uci:par) and @uci:heading-level > 0] {
  attach-value( "descendant::text()", heading-text, true() );
}

will attach the Bool value true to all text nodes that are descendants of a heading paragraph.


10.5. comment

comment() as Bool

Tests whether the context node is a Comment node.

10.6. delete

delete() as Void

This function deletes the context node (including all of its children) from the internal document tree.

Note

The context node is not removed until the next context node is selected during processing, as otherwise no context node would be available during processing. This function merely flags it as to be deleted at the next context node change.

10.7. detach-values

detach-values(valNamePatt as Id) as Bool

valNamePattIdvalue name or value name pattern

Removes a named single attached value or a number of values matching a pattern from the context node. The value name or pattern can be specified in valNamePatt.

When valNamePatt is a qualified name, that value is removed.

When valNamePatt is nsprefix:* , all values in the specified namespace with prefix nsprefix are removed.

When valNamePatt is *:key, all values with the local name key are removed from the node, regardless of the namespace they might be in. This does not include the null namespace!

When valNamePatt is :*, all values are removed from the context node that are in the null namespace.

The method returns a List of IDs representing the qualified names of all values that were actually removed from the context node.

To remove all values from the context node, regardless of name and/or namespace, use the function detach-values().

If valNamePatt does not exist or does not match any value present on the context node, the method does nothing.

10.8. detach-values

detach-values() as Bool

This method removes all attached valuea from the context node.

The method returns a List of Ids representing the qualified names of all values that were actually removed from the context node.

10.9. element

element() as Bool

Tests whether the context node is an element.

10.10. element

element(qname as Id) as Bool

qnameIdqualified element name to test for

Tests whether the context node has the same namespace and local name as the passed qualified name.

10.11. filter-attrs

filter-attrs(filterSpec as List) as Void

filterSpecListlist of two-element lists

Note

This function should only be used in an XML Exporter's attribute filter program.

This method lets you filter, i.e. remove, attributes by name or pattern from an element.

This is most useful before an element is serialized to a file or character stream. Since upCast internally produces a huge number of attributes (mostly by exploding CSS style properties into real attributes), the result can get unmanageable large. Mostly, however, you are actually only interested in a small set of attributes for further processing, and this function helps you specify which attributes to keep in the serialized XML tree and which to discard.

The filterSpec is a List of two-element Lists. Each two-element List consists of two pieces of information:

  1. The first element is either INCLUDE or EXCLUDE and defines whether the attribute(s) matched by the second element are to be included in or excluded from the serialized tree.

  2. The second element is either an attribute name or an attribute pattern (see below).

The second element supports the following patterns:

"*"

matches all attributes, regardless whether they are in a namespace or the null namespace. Note that this is the only pattern that can (and must) be specified as a String.

:*

matches all attributes in the null namespace

*:*

matches all attributes that are in a non-null namespace

prefix:*

matches all attributes in the namespace bound to the specified prefix

*:name

matches any attribute that has a local name of name

prefix:name

matches exactly the attribute name that is in the namespace that is bound to the prefix prefix

name

matches the attribute name in the null namespace

Each attribute present on the context node is filtered against the list of inclusion/exclusion patterns. The matching is done in order of listing.

The function returns a List of the Ids (names) of the attributes that have been removed from the element.

The workings of this function are best described with an example:

Example 5.63. 

Suppose we want to only include all attributes in the uci namespace except for uci:fullStyle and uci:diffStyle on the current context element. How can this be expressed easily?

Well, we just translate the above sentence word by word into the respective filter expression as follows:

filter-attrs(
  {
    { EXCLUDE, "*" }, // "only include..." means we start with nothing, i.e. exclude all
    { INCLUDE, uci:* }, // "...all attributes in the uci namespace..."
    { EXCLUDE, uci:fullStyle }, // "...except for uci:fullStyle..."
    { EXCLUDE, uci:diffStyle } // "...and uci:diffStyle."
  } );

Now, filter-attrs() takes each attribute on the context element and matches it in turn against each of the filter entries. If the attribute matches the pattern, it is tagged with the specified action, INCLUDE or EXCLUDE, overwriting the tag it had assigned previously.

If the attribute doesn't match a pattern, its current action tag is not changed.

The initial tag for each attribute is INCLUDE.

After the complete list of filters has been tried to match against the attribute, it will carry either an EXCLUDE or INCLUDE tag – and that designates how the attribute will be handled.

Simple, isn't it? You can mix EXCLUDE and INCLUDE filter elements as you like and achieve complex filters with very few lines of code.

Let's examine that algorithm for the sample attribute uci:fullStyle. Initially, its tag is INCLUDE. Now, for the nth specified entry in the filter list, the following happens:

  1. The attribute matches the pattern *. It is assigned the EXCLUDE tag.

  2. The attribute matches the pattern uci:*. It is assigned the INCLUDE tag.

  3. The attribute matches the pattern uci:fullStyle. It is assigned the EXCLUDE tag.

  4. The attribute doesn't match the pattern uci:diffStyle. Its tag is not changed.

At this point, all filter entries have been matched and possibly applied, and the last tag assigned to the attribute is EXCLUDE. Therefore, the attribute uci:fullStyle is excluded, i.e. removed from the context element (end as a result e.g. excluded from serialization).


10.12. filter-attrs

filter-attrs(filterSpec as List, flags as List) as List

filterSpecListList of two-element Lists
flagsListlist of Ids, one or more of: RETURN-REMOVED, RETURN-REMAINING, PREVIEW, INCLUDE-VALUES

This method lets you filter, i.e. remove, attributes by name or pattern from an element, or simply check for the presence of a bunch of attributes in one simple call.

The function works just like filter-attrs() (see there for detailed specs of the filter parameter), just that it gives you some more options of how it does things and what it will return by using the options parameter. This takes a list of options you may wish to specify (as Id) to modify the standard behaviour of the function:

RETURN-REMOVED

returns a List of Ids of the names of the attributes that have been removed from the element

RETURN-REMAINING

returns a List of Ids of the names of the attributes that remain on the element after filtering has taken place

PREVIEW

when specified, no attributes are actually removed from the element; instead, the action is just simulated. This is only useful in conjunction with either the RETURN-REMOVED or RETURN-REMAINIG options so you can check beforehand what would happen if you called filter-attrs() without the PREVIEW option

WITH-VALUES

only useful when either RETURN-REMOVED or RETURN-REMAINING is specified: in this case, the return value is not just a List of Ids, but a List of two-element Lists where the latter contain the attribute name (as Id) as the first element, and the attribute value as the second element

Example 5.64. Examples

For the following examples, suppose the context element is

<uci:par a="a" b="b" my:color="red" my:font-size="12pt"/>

Example 1

filter-attrs( {{ EXCLUDE, "*"},{ INCLUDE, my:* }}, { RETURN-REMOVED } )

will return (with elements in arbitrary order)

{ a, b }

and leave the element as

<uci:par my:color="red" my:font-size="12pt"/>

Example 2

filter-attrs( {{ EXCLUDE, "*"},{ INCLUDE, my:* }}, { RETURN-REMAINING } )

will return (with elements in arbitrary order)

{ my:color, my:font-size }

and leave the element as

<uci:par my:color="red" my:font-size="12pt"/>

Example 3

filter-attrs( {{ EXCLUDE, "*"},{ INCLUDE, my:* }}, { PREVIEW, RETURN-REMOVED } )

will return (with elements in arbitrary order)

{ a, b }

and will not modify the context element.

Example 4

filter-attrs( {{ EXCLUDE, "*"},{ INCLUDE, my:* }}, { RETURN-REMOVED, WITH-VALUES } )

will return (with elements in arbitrary order)

{ {a, "a"}, {b, "b"} }

and leave the element

<uci:par my:color="red" my:font-size="12pt"/>

Example 5

filter-attrs( {{ EXCLUDE, "*"},{ INCLUDE, my:* }}, { RETURN-REMOVED, RETURN-REMAINING, PREVIEW, WITH-VALUES } )

will return (with elements in arbitrary order)

{ {a, "a"}, {b, "b"}, {my:color, "red"}, {my:font-size, "12pt"} }

and will not modify the context element.

Example 6

filter-attrs( {{ EXCLUDE, "*"},{ INCLUDE, my:* }}, {} )

will return

null

and leave the element as

<uci:par my:color="red" my:font-size="12pt"/>

meaning that

filter-attrs( {{ EXCLUDE, "*"},{ INCLUDE, my:* }}, {} )

is equivalent to

filter-attrs( {{ EXCLUDE, "*"},{ INCLUDE, my:* }} )

10.13. get-attr

get-attr(attrName as Id) as Value

attrNameIdqualified attribute name

Returns the value of the attribute attrName on the context node. When the attribute does not exist or the context node is not an element, a Null value is returned.

This method backs the shortcut @attrName in UPL Core.

Example 5.65. 

@uci:class

is equivalent to

get-attr( uci:class )

and will return the value of the attribute uci:class on the context node (or a Null value, if it does not exist).


10.14. get-attr

get-attr(target as String, attrName as Id) as Value

targetStringtarget xpath, must select exactly one node
attrNameIdqualified attribute name

Returns the value of the attribute attrName on the node selected by the XPath 1.0 expression target. When the attribute does not exist on the target node a Null value is returned.

When target selects more than a single node, an EvalException is thrown.

Example 5.66. 

get-attr( "/document/par[1]", uci:class )

will return the value of the attribute uci:class on the first par child of the document element (or a Null value, if it does not exist).


10.15. get-attr

get-attr(target as String, attrName as Id, fallback as Value) as Value

targetStringtarget xpath, must select exactly one node
attrNameIdqualified attribute name
fallbackValue

Returns the value of the attribute attrName on the node selected by the XPath 1.0 expression target. When the attribute does not exist on the target node, the fallback value is returned.

When target selects more than a single node, an EvalException is thrown.

Example 5.67. 

get-attr( "/document/par[1]", heading-level, 0 )

will return the value of the attribute heading-level on the first par child of the document element, or a Numeric value of 0 if it does not exist.


10.16. get-attr

get-attr(attrName as Id, fallback as Value) as Value

attrNameIdqualified attribute name
fallbackValuefallback value when attribute does not exist

Returns the value of the attribute attrName on the context node. When the attribute does not exist or the context node is not an element, fallback is returned instead.

Example 5.68. 

get-attr( heading-level, 0 )

will return the value of the attribute heading-level on the context node (as String), or a Numeric value of 0 if it does not exist.


10.17. get-value

get-value(key as Id) as Value

keyIdkey to get the value for

This method retrieves any value with name key from the context node that was previously set on that node using the attach-value() function.

If such a value does not exist on the context node, the method returns Null.

10.18. insert-nodes

insert-nodes(mode as Id, xmlsource as String) as Bool

modeIdBEFORE | AFTER | FIRST-CHILD | LAST-CHILD
xmlsourceStringwell-formed XML source code fragment (tree or forest) to insert

This function inserts the parsed XML xmlsource document fragment in accordance with the specified mode relative from the context node.

The mode parameter can have the following values:

BEFORE

the document fragment is inserted immediately before the context node as its preceding sibling

AFTER

the document fragment is inserted immediately after the context node as its following sibling

FIRST-CHILD

the document fragment is inserted as first child of the context node

LAST-CHILD

the document fragment is inserted as last child of the context node

Since xmlsource is parsed before being inserted into the document tree, it must be well-formed.

Any namespaces and prefixes used in the xmlsource string must be declared and in-scope in the UPL program at the calling position of the function.

The function returns true when the target expression selects at least one target node, false otherwise.

Note

Obviously, you cannot call this method on the document root element when mode is BEFORE or AFTER. Trying to do so will result in an exception being thrown.

Example 5.69. 

#namespace uci "http://www.infinity-loop.de/namespace/2006/upcast-internal";

[element(uci:item)] {
  insert-nodes( BEFORE, "<uci:marker>" + @uci:numberingtext + "</uci:marker>" );
}

inserts an element uci:marker before every list item element and makes its text content the numbering text string. When the document tree looks like

<uci:item uci:numberingtext="(a)">Item a</uci:item>
<uci:item uci:numberingtext="(b)">Item a</uci:item>
…

, then it will result after running the above UPL treeprocessor rule in the following:

<uci:marker>(a)</uci:marker><uci:item uci:numberingtext="(a)">Item a</uci:item>
<uci:marker>(b)</uci:marker><uci:item uci:numberingtext="(b)">Item a</uci:item>
…

10.19. insert-nodes

insert-nodes(target as String, mode as Id, xmlsource as String) as Bool

targetStringXPath 1.0 expression selecting single target node
modeIdBEFORE | AFTER | FIRST-CHILD | LAST-CHILD
xmlsourceStringwell-formed XML source code fragment (tree or forest) to insert

This function inserts the parsed XML xmlsource document fragment in accordance with the specified mode relative from each of the target nodes selected by the target XPath 1.0 expression.

The mode parameter can have the following values:

BEFORE

the document fragment is inserted immediately before each target node as its preceding sibling

AFTER

the document fragment is inserted immediately after each target node as its following sibling

FIRST-CHILD

the document fragment is inserted as first child of each target node

LAST-CHILD

the document fragment is inserted as last child of each target node

Since xmlsource is parsed before being inserted into the document tree, it must be well-formed.

Any namespaces and prefixes used in the xmlsource string must be declared and in-scope in the UPL program at the calling position of the function.

The function returns true when the target expression selects at least one target node, false otherwise.

Note

You cannot call this method with a target that is the document root element (when mode is BEFORE or AFTER) or any other target node that does not allow the document fragment to be inserted as its previous or following sibling. Trying to do so will result in an exception being thrown.

Example 5.70. 

#namespace uci "http://www.infinity-loop.de/namespace/2006/upcast-internal";

[element(uci:par)] {
  insert-nodes( "descendant::uci:inline", BEFORE, "<uci:start/>" );
  insert-nodes( "descendant::uci:inline", AFTER, "<uci:end/>" );
}

will surround each descendant uci:inline element of a paragraph by the empty elements uci:start and uci:end. When the document tree looks like

<uci:par>This is 
  <uci:inline csso:font-weight="bold">bold and 
    <uci:inline csso:font-style="italic">bold-italic</uci:inline>
  </uci:inline> text.</uci:par>

(indented for clarity), then it will result after running the above UPL treeprocessor rule in the following:

<uci:par>This is 
  <uci:start/><uci:inline csso:font-weight="bold">bold and 
    <uci:start/><uci:inline csso:font-style="italic">bold-italic</uci:inline></uci:end/>
  </uci:inline><uci:end/> text.</uci:par>

10.20. mark-split

mark-split(where as Id, condition as BoolExpression, mode as Id) as Void

whereIdBEFORE | AFTER | BOTH | OFF
conditionBoolExpressionboolean expression to evaluate with the actual context later
modeIdSPLIT | BELOW

This function puts a split marker onto the context node which defines various properties of a tree splitting action to be performed after the tree traversal of this UPL Tree Processor run.

The parameter where indicates where in relation to the context node the split should be performed:

BEFORE

split the tree immediately before the context node

AFTER

split the tree after the context node

BOTH

split the tree both, before and after the context node

OFF

remove any previously set splitting marks on this node; the parameters condition and mode are not used

The parameter condition identifies the point up to which node in the ancestor chain the tree should be split. That node is identified as the first node in the ancestor axis (starting from the context node) for which the specified condition evaluates to true.

The mode parameter determines how the node identified by the condition expression is to be interpreted:

SPLIT

the node identified is the last one in the ancestor chain to get split

BELOW

the node identified is the first one in the ancestor chain that must not get split

Example 5.71. 

Assuming the following XML source

<document>
  <par>ABC<br/>DEF.</par>
  <par>AB<span class="a">C<br/>D</span>EF.</par>
  <par>AB<i class="b"><br/>C</i>DEF.</par>
  <par>AB<em class="c">C<br/></em>DEF.</par>
</document>

, with an UPL rule of

[element(br)] {
  mark-split( AFTER, element(span) or element(par), SPLIT );
}

the following result is achieved after tree traversal and performing the actual splitting:

1  <document>
2    <par>ABC<br/></par><par>DEF.</par>
3    <par>AB<span class="a">C<br/></span><span class="a">D</span>EF.</par>
4    <par>AB<i class="b"><br/></i></par><par><i class="b">C</i>DEF.</par>
5    <par>AB<em class="c">C<br/></em></par><par>DEF.</par>
6  </document>

Explanation:

line 2: The nearest ancestor of the br element that first satisfies the condition is the par element. Therefore, that is the topmost one to get split into two.

line 3: The nearest ancestor of the br element that first satisfies the condition is the span element. Therefore, that is the topmost one to get split into two. Note how the attribute is automatically copied to the cloned span node as well.

line 4: The nearest ancestor of the br element that first satisfies the condition is the par element. Therefore, that is the topmost one to get split into two. Note how the i element is cloned (incl. its class attribute) as required.

line 5: The nearest ancestor of the br element that first satisfies the condition is the par element. Therefore, that is the topmost one to get split into two. Although the split takes place after the br element, but still within the em element, that is not found as a clone in the second, cloned par element because it is completely empty.


10.21. mark-split

mark-split(where as Id, condition as BoolExpression, mode as Id, attachmentValue as Value) as Void

whereIdBEFORE | AFTER | BOTH | OFF
conditionBoolExpressionboolean expression to evaluate with the actual context later
modeIdSPLIT | BELOW
attachmentValueValueany UPL value to be attached to topmost node that got actually split

This function puts a split marker onto the context node which defines various properties of a tree splitting action to be performed after the tree traversal of this UPL Tree Processor run.

This function behaves exactly the same as mark-split() (see there for the exact core semantics of the splitting actions), except that additionally, you can attach an UPL attachmentValue of your choice on (each of) the topmost node(s) – in ancestor direction – that actually got split as a result of the run of the Splitter post processing step.

This value is then stored on those nodes under the value keys uci:split-before-value and uci:split-after-value, depending on which value was used for the where parameter. When BOTH was used, both value keys is assigned the same, passed attachmentValue .

After the Splitter has run, the values can be queried on those nodes by calling:

get-value( uci:split-before-value )

resp.

get-value( uci:split-after-value )

10.22. move-nodes

move-nodes(srcXpath as String, destXpath as String, mode as Id) as Bool

srcXpathStringXPath selecting the source node(s)
destXpathStringXPath selecting the destination node (one and only one!)
modeIdBEFORE | AFTER | FIRST-CHILD | LAST-CHILD

This function moves the nodes selected by srcXpath in accordance with the specified mode relative to the single target node selected by the destXpath XPath 1.0 expression. The moved nodes keep their original positional relation (document order) with respect to each other after they have been moved.

Important

The set of nodes to be moved must not contain the current context node.

The mode parameter can have the following values:

BEFORE

the source nodes are inserted immediately before the target node as its preceding siblings

AFTER

the source nodes are inserted immediately after the target node as its following siblings

FIRST-CHILD

the source nodes are inserted as first child of the target node

LAST-CHILD

the source nodes are inserted as last child of the target node

The function returns true when the srcXpath expression selects at least one source node and the destXpath expression selects exactly one target node, false otherwise.

Note

You cannot call this method with a target that is the document root element (when mode is BEFORE or AFTER) or any other target node that does not allow the document fragment to be inserted as its previous or following sibling. Trying to do so will result in an exception being thrown.

The function also throws an EvalException when the current context node is among the nodes selected by srcXPath.

10.23. move-nodes

move-nodes(srcXpath as String, destXpath as String, mode as Id, parentElem as Id) as Bool

srcXpathStringXPath selecting the source node(s)
destXpathStringXPath selecting the destination node (one and only one!)
modeIdBEFORE | AFTER | FIRST-CHILD | LAST-CHILD
parentElemIdname of new parent to gather source nodes under

This function moves the nodes selected by srcXpath in accordance with the specified mode relative to the single target node selected by the destXpath XPath 1.0 expression. Furthermore, the moved nodes are gathered under a newly created element with name parentElem. The moved nodes keep their original positional relation (document order) with respect to each other after they have been moved.

Important

The set of nodes to be moved must not contain the current context node.

The mode parameter can have the following values:

BEFORE

the new element (with the source nodes as children) is inserted immediately before the target node as its preceding sibling

AFTER

the new element (with the source nodes as children) is inserted immediately after the target node as its following sibling

FIRST-CHILD

the new element (with the source nodes as children) is inserted as first child of the target node

LAST-CHILD

the new element (with the source nodes as children) are inserted as last child of the target node

The function returns true when the srcXpath expression selects at least one source node and the destXpath expression selects exactly one target node, false otherwise.

Note

You cannot call this method with a target that is the document root element (when mode is BEFORE or AFTER) or any other target node that does not allow the document fragment to be inserted as its previous or following sibling. Trying to do so will result in an exception being thrown.

The function also throws an EvalException when the current context node is among the nodes selected by srcXPath.

Note

This function behaves exactly like move-nodes$ssi, just that it additionally groups the gathered nodes under a newly created element before re-inserting into the tree.

10.24. name

name() as String

Returns the qualified name of the context node.

Note

This returns the qualified name of the context node as it is specified in the source document. This means that any namespace prefixes are the ones declared and used in the internal document, not the ones declared and used in the UPL code.

10.25. processing-instruction

processing-instruction() as Bool

Tests whether the context node is a Processing Instruction node.

10.26. remove-attrs

remove-attrs(attrNamePattern as Id) as List

attrNamePatternIdqualified attribute name or name pattern

Removes a named single attribute or a number of attributes matching a pattern from the context node. The attribute name or pattern can be specified in attrNamePatt.

When attrNamePatt is a qualified attribute name, that attribute is removed.

When attrNamePatt is nsprefix:* , all attributes in the specified namespace with prefix nsprefix are removed.

When attrNamePatt is *:attname, all attributes with the local name attname are removed from the node, regardless of the namespace they might be in. This does not include the null namespace!

When attrNamePatt is :*, all attributes are removed from the context node that are in the null namespace.

The method returns a List of Ids representing the qualified names of all attributes that were actually removed from the context node.

If attrNamePatt does not exist or does not match any value present on the context node, the method does nothing.

To remove all attributes from the context node, regardless of name and/or namespace, use the function remove-attrs() with no parameters.

Note:

This function is equivalent to a call of remove-attrs$si like the following:

remove-attrs( ".", attrNamePatt );

10.27. remove-attrs

remove-attrs() as List

Removes all attributes from the context node, regardless of (local) name and/or namespace.

The method returns a List of Ids representing the qualified names of all attributes that were actually removed from the context node.

Note:

This function is equivalent to the following sequence of remove-attrs() calls:

remove-attrs( *:* );
remove-attrs( :* );

10.28. remove-attrs

remove-attrs(target as String, attrNamePatt as Id) as List

targetStringXPaXPath 1.0 expression selecting node(s) to rermove attributes from
attrNamePattIdqualified attribute name or pattern

Removes a named single attribute or a number of attributes matching a pattern from all element nodes selected by the target XPath 1.0 expression (evaluated relative from the context node).

The function returns the qualified names of all removed attributes as a List of Ids. This means that there may be duplicate Ids in that list. If no attribute was removed, an empty list is removed.

The method returns a List of Ids representing the qualified names of all attributes that were actually removed from the context node.

If the target XPath selects a non-element context node, the function will throw an exception.

Important

You cannot remove attributes in the css, csso and cssc namespaces. Attributes in that namespace are synthesized and read-only. Trying to remove an attribute in these namespaces will be silently ignored.

The attribute name or pattern can be specified in attrNamePatt.

When attrNamePatt is a qualified attribute name, that attribute is removed.

When attrNamePatt is nsprefix:* , all attributes in the specified namespace with prefix nsprefix are removed.

When attrNamePatt is *:attname, all attributes with the local name attname are removed from the node, regardless of the namespace they might be in. This does not include the null namespace!

When attrNamePatt is :*, all attributes are removed from the context node that are in the null namespace.

If attrNamePatt does not match any value present on the selected node, the method does nothing.

Tip:

To remove all attributes, regardless of name and/or namespace, use the following code sequence:

remove-attrs( xpath, *:* );
remove-attrs( xpath, :* );

10.29. rename-element

rename-element(newName as Id) as Void

newNameIdthe element's new qualified name

When the context node is an element, renames that element to the specified newName.

10.30. replace-with-children

replace-with-children() as Void

This function replaces the context node by its children, i.e. it effectively removes the context node from the tree, moving its children onto its parent.

Note

The context node is not removed until the next context node is selected during processing, as otherwise no context node would be available during processing. However, it does no longer have children, which will have been already moved directly after it in its sibling axis. Keep this in mind when performing further actions or evaluating XPath expressions from the context node after calling this function.

10.31. replace-with-text

replace-with-text(data as String) as Void

dataStringreplacement text

This function replaces the context node (including its descendants, if any), by a single Text node with the string contents as specified in the data parameter.

Example 5.72. 

<city>M<entity name="uuml"/>nchen</city>

With a rule like

[element(entity)] {
  if( @name="auml" ) {
    replace-with-text( "ä" );
  } else if( @name="ouml" )
    replace-with-text( "ö" );
  } else if( @name="uuml" )
    replace-with-text( "ü" );
  }
}

will result in the following XML:

<city>München</city>

10.32. set-attr

set-attr(attrName as Id, attrValue as Value) as Bool

attrNameIdqualified attribute name
attrValueValuevalue to set attribute to

Creates or sets an attribute of name attrName with value attrValue on the context element. The value is converted to a String before being set as if using the to-string() function on the value.

The function returns true when the attribute could be set, false otherwise. It throws an exception when trying to set an attribute on a non-element node, or if the respective attribute is not allowed to be set on the context element (e.g. uci:class on an element other than uci:inline or uci:par).

Attributes uci:class and uci:rawclass

Setting uci:class or uci:rawclass also updates style properties (cssc and css namespaces) accordingly. A warning will be issued and the value will not be set if there is no style by the requested stylename value found in the document's stylesheet. Setting uci:class updates uci:rawclass accordingly as well (and vice versa).

Important

You cannot set attributes in the css namespace http://www.infinity-loop.de/namespace/2006/upcast-css. Attributes in that namespace are read-only. Trying to set an attribute in this namespace will throw an EvalException.

10.33. set-attr

set-attr(target as String, attrName as Id, attrValue as Value) as Bool

targetStringXPath 1.0 expression selecting node(s) to set attribute on
attrNameIdqualified attribute name
attrValueValueattribute value (converted to String)

Creates or sets an attribute of name attrName with value attrValue on all element nodes selected by the target XPath 1.0 expression (evaluated relative from the context node). The value is converted to a String before being set as if using the to-string() function on the value.

The function returns true when the target expression selects at least one target element, false otherwise. It throws an exception when trying to set an attribute on a non-element node, or if the respective attribute is not allowed to be set on the context element (e.g. uci:class on an element other than uci:inline or uci:par).

Attributes uci:class and uci:rawclass

Setting uci:class or uci:rawclass also updates style properties (cssc and css namespaces) accordingly. A warning will be issued and the value will not be set if there is no style by the requested stylename value found in the document's stylesheet. Setting uci:class updates uci:rawclass accordingly as well (and vice versa).

Important

You cannot set attributes in the css namespace http://www.infinity-loop.de/namespace/2006/upcast-css. Attributes in that namespace are read-only. Trying to set an attribute in this namespace will throw an EvalException.

Example 5.73. 

#namespace uci "http://www.infinity-loop.de/namespace/2006/upcast-internal";

[element(uci:par)] {
  set-attr( "descendant::uci:image", inPara, true() );
}

will set the attribute inPara to value "true" on all descendant uci:image elements in a paragraph.


10.34. specifies

specifies(propertyName as Id) as Bool

propertyNameIdqualified attribute name

This function returns true when on the current context node, the given attribute or CSS property propertyName is actually specified (in contrast to the value being also available by inheritance when it was specified at some ancestor element), and false otherwise.

The special upCast namespaces for CSS properties are handled in the following manner:

csso:<propname>

The function returns true when on the context node, the CSS property propname is explicitly specified as a local style override on that node.

cssc:<propname>

The function returns true when on the context node, the CSS property propname is specified by way of a reference to a named style rule. This means that as a consequence, this node also specifies a uci:class attribute, i.e. specifies( uci:class ) is also always true on such a node.

css:<propname>

The function returns true when the property propname was explicitly specified on that node either by a local style override or a named style reference. Effectively, this is a shortcut for (specifies( csso:propname ) or specifies( cssc:propname )).

Example 5.74. 

<uci:inline uci:diffStyle="color: red;">
  <uci:inline uci:diffStyle="font-weight: bold;">Text that is red and bold.</uci:inline>
</uci:inline>

Given the XML above,

specifies( csso:color )

returns true for the outer uci:inline element and false for the inner uci:inline element. Contrast this to querying the color value, where

@css:color

will return "red" for both the outer and inner uci:inline element because the color is inherited from the outer element to the inner.


10.35. string

string() as String

Returns the concatenated PCDATA content of the descendant Text nodes of the context node (i.e. "the text content").

10.36. text

text() as Bool

Tests whether the context node is a Text node.

11. Numeric functions

11.1. abs

abs(val as Numeric) as Numeric

valNumericnumeric value

Returns the absolute value of the Numerics val.

Example 5.75. 

abs( -3 )

returns 3

abs( -1.5in )

returns 2160tw (=1.5in)

abs( 12.67 )

returns 12.67


11.2. avg

avg(values as List) as Numeric

valuesListList of Numerics

Returns the arithmetic average of the elements in the List values.

All Numerics in values must have the same power. The function throws an EvalException if they don't.

The function throws an EvalException if values is the empty list.

The function throws an EvalException if at least one value in values cannot be cast to a Numeric.

11.3. max

max(v1 as Numeric, v2 as Numeric) as Numeric

v1Numericnumeric value
v2Numericnumeric value

Returns the maximum of the two Numerics v1 and v2. Both must have the same power.

Example 5.76. 

max( -3, 5.2 )

returns 5.2

max( 1in, 15mm )

returns 1440tw (=1in)

max( 10, 20mm )

throws an exception, because the two values do not have the same power


11.4. max

max(values as List) as Numeric

valuesListList of Numeric values

Returns the maximum of the Numeric values contained in the passed List values.

All elements must have the same power. An EvalException is thrown when they don't.

An EvalException is also thrown when values contains an element that cannot be cast to Numeric.

Example 5.77. 

max( { 3, 4, 2} )

returns 4

max( { "3.0", 4, "2.0" } )

returns 4

max( { 10, 20mm } )

throws an exception because the values do not have the same power

max( { 10, "five" } )

throws an exception because the string "five" cannot be cast to a Numeric


11.5. min

min(v1 as Numeric, v2 as Numeric) as Numeric

v1Numericnumeric value
v2Numericnumeric value

Returns the minimum of the two Numerics v1 and v2. Both must have the same power.

Example 5.78. 

min( -3, 5.2 )

returns -3

min( 3cm, 1in )

returns 1440tw (=1in)

min( 10, 20mm )

throws an exception, because the two values do not have the same power


11.6. min

min(values as List) as Numeric

valuesListList of Numeric values

Returns the minimum of the Numeric values contained in the passed List values.

All elements must have the same power. An EvalException is thrown when they don't.

An EvalException is also thrown when values contains an element that cannot be cast to Numeric.

Example 5.79. 

min( { 3, 4, 2} )

returns 2

min( { "3.0", 4, "2.0" } )

returns 2

min( { 10, 20mm } )

throws an exception because the values do not have the same power

min( { 10, "five" } )

throws an exception because the string "five" cannot be cast to a Numeric


11.7. sqrt

sqrt(val as Numeric) as Numeric

valNumeric

Returns the square root of the Numeric val.

Example 5.80. 

sqrt( 4 )

returns 2

sqrt( 8in * 2in )

returns 5760tw (=4in)

sqrt( 4in )

throws an EvalException since the dimension is only a simple length, not an area dimension


11.8. sum

sum(values as List) as Numeric

valuesListList of Numerics

Returns the sum of the elements in the List values.

All Numerics in values must have the same power. The function throws an EvalException if they don't.

The function throws an EvalException if values is the empty list.

The function throws an EvalException if at least one value in values cannot be cast to a Numeric.

12. Other functions

12.1. app-buildnumber

app-buildnumber() as Numeric

This function returns the build number (as integer) of the upCast application running it.

You can use this to determine whether the build is high enough (or within a range) in which all required functions will be available, or you can make sure that you are running on a build where some important bug fix your code relies on is implemented.

12.2. debug

debug(items as Value, ...) as Void

itemsValuevariabe list of values to print

Outputs the string values of the individual elements in items in a special debug format on the system console.

12.3. delay

delay(milliseconds as Numeric) as Void

millisecondsNumericnumber of milliseconds as integer

This method pauses execution for the specified number of milliseconds.

Example 5.81. 

This lets you create a simple watched folder functionality from right within UPL, where you process any files in a folder every 60 seconds:

function watchFolder( $folder as String ) {
  while( true() ) { // Loop forever until user clicks Stop
    variable $files as List := { };
    variable $f as String :="";

    $files := list-files( $folder );
    for-each( $f in $files ) {
      process-file( $f );
    }

    delay( 60000 ); // Wait 60 seconds
  }
}

12.4. entering

entering() as Bool

This function lets you query the processing state of the rule for the current node. It returns true when the processing state is on entering the node, i.e. before processing its children, and false otherwise.

Note

You must enable leaveEvents support in UPL before you can take different actions in a rule depending on whether you enter or leave a node. leaveEvents is false (=off) by default, and entering() will always return true in this case.

12.5. eval-xpath

eval-xpath(xpathExpression as String) as List

xpathExpressionStringXPath 1.0 expression

This function lets you evaluate an XPath 1 expression against the internal document tree, the XPath context node being the same as the UPL context node.

The result is always a List.

This function is a shorthand for eval-xpath() with the mode parameter set to UPCAST – see the function description there for further details.

eval-xpath( "xpath-expression" )

is equivalent to

eval-xpath( "xpath-expression", UPCAST )

12.6. eval-xpath

eval-xpath(xpathExpression as String, mode as Id) as List

xpathExpressionStringXPath 1.0 expression
modeIdUPCAST | STRING-VALUE

This function lets you evaluate the XPath 1 expression xpathExpression against the internal document tree, the XPath context node being the same as the UPL context node.

The result is always a List, even when no or only one item is selected by the XPath expression. The result list contains an element for each of the items selected by the XPath expression.

When no items have been selected by the XPath expression, an empty List is returned.

The mode parameter controls how selected items are returned as UPL values:

UPCAST

the selected items are returned by applying the rules listed in the table below for this mode

STRING-VALUE

the selected items are all returned as String. Their values are the same as if on each of them, the XPath function string() had been applied.

The items returned by the XPath engine are converted to UPL values as follows, depending on the selected mode:

XPath result type

mode = 
UPCAST

mode = 
STRING-VALUE

UPL value type

Remarks

UPL value type

Remarks

Document, Element

String

returns the XML serialization of the document or element, without XML declaration, UTF-8 encoding

String

returns the XPath string value of the item, i.e. as if calling string() on it

Attribute

String

the value of the attribute

Text

String

the text content of the Text node

java.lang.Boolean

Bool

java.lang.Integer
java.lang.Long
java.lang.Float
java.lang.Double

Numeric

any other

String

created by performing toString() on the returned Java object representation

The implementation currently uses the Jaxen XPath engine.

Example 5.82. 

For the document

<doc>
    <par>Some <bold>important</bold> text.</par>
</doc>

and the context node being the doc element, the result of calling

eval-xpath( "descendant::*", UPCAST )

would be

{
  "<par>Some <bold>important</bold> text.</par>",
  "<bold>important</bold>"
}

whereas the result of

eval-xpath( "descendant::*", STRING-VALUE )

would be

{
  "Some important text.",
  "important"
}

XPath extension functions

The XPath 1 implementation in upCast defines the following additional functions you may use in your expression:

Function

Description

current()

returns the UPL context node within which eval-xpath() is being called

is-same-node( n1, n2 )

returns true if nodes n1 and n2 are the same nodes (node identity)

This is equivalent to XPath 2's is operator.

node-before( n1, n2 )

true when n1 comes before n2 in document order, false otherwise

This is equivalent to XPath 2's << operator.

node-after( n1, n2 )

true when n1 comes after n2 in document order, false otherwise

This is equivalent to XPath 2's >> operator.

is-descendant( n1, n2 )

true when n1 is a descendant node of n2, false otherwise

is-ancestor( n1, n2 )

true when n1 is an ancestor node of n2, false otherwise

12.7. generate-uuid

generate-uuid() as String

Generates a UUID (universal unique identifier) and returns it as String.

Note:

The implementation is based on the following Java code and therefore behaves similarly:

UUID.randomUUID().toString()

12.8. get-environment-value

get-environment-value(key as Id) as Value

keyIdenvironment value key

This function lets you query several environment variables. Available key values and their meaning are described here.

12.9. get-outline-level

get-outline-level() as Numeric

Returns the outline level (as specified in an RTF document imported by the RTF Importer) of the context node, if it is a paragraph. The returned value is an integer between 1 and 9 for a paragraph if it has an outline level assigned.

The function returns 0 for a paragraph at body text level or if the context node is not a paragraph element.

12.10. get-realm-value-names

get-realm-value-names(realm as Id) as List

realmIdrealm name

Returns a List of all variable names stored and available in the specified realm.

This method is only defined for the realms pipeline and module.

12.11. get-rulemode

get-rulemode() as String

Returns the currently set rule mode. See set-rulemode().

12.12. get-var

get-var(varname as Id) as Value

varnameIdqualified variable name

Returns the value of the in-scope UPL variable with the same name as the Id passed in varname.

If (and only if) the namespace of the Id is one of the variable realm namespaces, the value returned is the same as $varname would have returned. This construction is useful when the variable's name to fetch was calculated at runtime or retrieved from get-realm-value-names(). When the specified realm variable does not exist, an EvalException is thrown. You can test for the existence of a realm variable using exists-var(). The method performs a type coercion for realm variables according to the following table:

Source type

Returned type

java.lang.Boolean

Bool

java.lang.Double

Numeric

java.lang.Float

Numeric

java.lang.Integer

Numeric

java.lang.String

String

java.util.List

List

java.lang.Object

String

null

Null

Example 5.83. 

get-var( pipeline:SourceFile )

will return the current value of the pipeline variable SourceFile.


12.13. get-var

get-var(varname as Id, defaultValue as Value) as Value

varnameIdqualified variable name
defaultValueValuedefault value as fallback

Returns the value of the in-scope UPL variable with the same name as the Id passed in varname.

If (and only if) the namespace of the Id is one of the variable realm namespaces, the value returned is the same as $varname would have returned. This construction is useful when the variable's name to fetch was calculated at runtime or retrieved from get-realm-value-names(). When the specified realm variable does not exist or otherwise cannot be retrieved, defaultValue is returned.

The method performs a type coercion for realm variables according to the following table:

Source type

Returned type

java.lang.Boolean

Bool

java.lang.Double

Numeric

java.lang.Float

Numeric

java.lang.Integer

Numeric

java.lang.String

String

java.util.List

List

java.lang.Object

String

null

Null

Example 5.84. 

get-var( pipeline:width, 1.0in )

will return the current value of the pipeline variable width. In case it is not defined at this point, the length 1.0in is returned instead.


12.14. hoist-attr

hoist-attr(hoistAttr as Id, finalAttr as Id) as Void

hoistAttrIdqualified name of attribute to hoist
finalAttrIdqualified name to use for final hoisted attribute

This function is a (very…) special-purpose function to hoist identical values of a certain attribute as far up (in direction to the root of the document) in the tree as possible, with the context node being the upmost element to hoist to (the hoisting border).

hoistAttr is the name of the attribute that you want to hoist (move up).

finalAttr is the name of the attribute that should be created at the topmost possible element of the hoisting operation with the value of the attribute that was hoisted.

Important

hoistAttr and finalAttr must not be the same Id!

Example 5.85. 

With an UPL Tree Processor and the following code,

[element(d)] {
  hoist-attribute( a, a-hoisted );
}

from this source

<?xml version="1.0" encoding="UTF-8"?>
<d a="10">
  <p a="12">
    <i a="8"><i2>x</i2>A</i>
    <i a="8">B</i>
  </p>
  <p a="12">
    <i>C</i>
    <i a="12">D</i>
  </p>
</d>

you'll get this result:

<?xml version="1.0" encoding="UTF-8"?>
<d a="10">
  <p a="12" a-hoisted="8">
    <i a="8"><i2>x</i2>A</i>
    <i a="8">B</i>
  </p>
  <p a="12" a-hoisted="12">
    <i>C</i>
    <i a="12">D</i>
  </p>
</d>

As you see, the value of attribute a of "8" has been hoisted onto the first p child of the d element. This is because all children of that p element have the value "8" for attribute a. Note that the setting of a to "12" on the first p has no effect on its children, since they immediately re-set that attribute to "8".

The value of attribute a cannot be hoisted all the way to the context node d as the two values for a are different for its two children (namely "8" and "12").


12.15. hoist-attr

hoist-attr(hoistAttr as Id, finalAttr as Id, finalExcludes as List) as Void

hoistAttrIdqualified name of attribute to hoist
finalAttrIdqualified name to use for final hoisted attribute
finalExcludesListList of qualified element names as Id

This function is a (very…) special-purpose function to hoist identical values of a certain attribute as far up (in direction to the root of the document) in the tree as possible, with the context node being the upmost element to hoist to (the hoisting border).

hoistAttr is the name of the attribute that you want to hoist (move up).

finalAttr is the name of the attribute that should be created at the topmost possible element of the hoisting operation with the value of the attribute that was hoisted.

Important

hoistAttr and finalAttr must not be the same Id!

finalExcludes is a List of element names (Ids) that are not allowed as a valid target for a topmost possible element. If an element would be (algorithmically) the topmost hoisting target, and it is in the finalExcludes list, the hoisted attribute will be created on each child of the excluded target element.

Example 5.86. 

With an UPL Tree Processor and the following code,

[element(d)] {
  hoist-attribute( a, a-hoisted );
}

from this source

<?xml version="1.0" encoding="UTF-8"?>
<d a="10">
  <p a="12">
    <i a="8"><i2>x</i2>A</i>
    <i a="8">B</i>
  </p>
  <p a="12">
    <i>C</i>
    <i a="12">D</i>
  </p>
</d>

you'll get this result:

<?xml version="1.0" encoding="UTF-8"?>
<d a="10">
  <p a="12" a-hoisted="8">
    <i a="8"><i2>x</i2>A</i>
    <i a="8">B</i>
  </p>
  <p a="12" a-hoisted="12">
    <i>C</i>
    <i a="12">D</i>
  </p>
</d>

As you see, the value of attribute a of "8" has been hoisted onto the first p child of the d element. This is because all children of that p element have the value "8" for attribute a. Note that the setting of a to "12" on the first p has no effect on its children, since they immediately re-set that attribute to "8".

The value of attribute a cannot be hoisted all the way to the context node d as the two values for a are different for its two children (namely "8" and "12").


12.16. hoist-single-listpar

hoist-single-listpar() as Void

This method, when called a uci:par element context node and for which single-listpar-level returns a value greater than zero, removes all surrounding list structures and moves the leaf paragraph to the top (replacing the former top-level uci:list item).

Example 5.87. 

Calling the method

hoist-single-listpar()

on the uci:par element as context node with a structure like this:

<uci:list>
    <uci:item>
        <uci:par>Heading 1</uci:par>
    </uci:item>
</uci:list>

as a result the document tree after invocation of that function will look like this:

<uci:par>Heading 1</uci:par>

A potentially useful sequence of code would be a rule like this:

[element(uci:par) and single-listpar-level() > 0]
{
  set-heading-level( single-listpar-level() );
  hoist-single-listpar();
}

This will effectively make a heading (with corresponding sectioning in a subsequent Sectioner module in the pipeline) for all "headings" constructed in this way.

Tip

To reduce the possibility of making single-item, "real" lists a heading, you may want to add some more conditions to the selector, e.g. base your decision also on font size of the paragraph, length of text (headings are usually rather short) or similar.


12.17. html-to-image

html-to-image(srcUrl as String, destImage as String) as List

srcUrlStringsource XHTML document
destImageStringpath to write image to

This function renders the XHTML document located at srcUrl to a bitmap image using the current system monitor's dpi resolution (or at 72ppi if running in Java headless mode) and writes it in PNG format to the specified destImage file.

The function returns a list of two-element lists (i.e. a key/value map implemented using UPL's List datatype). The following keys (as Id) may be returned:

status

- a Bool value, true when the conversion was successful, false otherwise; this key will always be present in the result

width

- a Numeric specifying the width in pixels of the generated image

height

- a Numeric specifying the height in pixels of the generated image

dpi

- a Numeric specifying the dpi of the generated image

Example 5.88. 

The following call

html-to-image( "http://www.upcast.de/", $pipeline:base + "/output.png" )

will render the upCast homepage into a pixmap and write it as PNG to the file output.png (at the same level as the calling .ucdoc).

The call might return a nested list like the following:

{
  {status,true},
  {width,666},
  {height,781},
  {dpi,72}
}

Note

The output of this function is backed by the LGPL-licensed, open-source Flying Saucer rendering engine.

The rendering engine supports transparency, most CSS 2.1 properties and XHTML 1.0 presentational element attributes. For details, see its project home.

12.18. leaving

leaving() as Bool

This function lets you query the processing state of the rule for the current node. It returns true when the processing state is on leaving the node, i.e. after having processed its children, and false otherwise.

Note

You must enable leaveEvents support in UPL before you can take different actions in a rule depending on whether you enter or leave a node. leaveEvents is false (=off) by default, and leaving() will always return false in this case.

12.19. markup-regex

markup-regex(regExpr as String, markupActions as List) as Bool

regExprStringregular expression
markupActionsListlist of actions for each group in regExpr

This function evaluates the regular expression on the plain character content of all descendants of the context node, optionally creating markup over the matched character sequences. This is similar to matches-list() with the source string being the CDATA content of the context node, but you can additionally specify one of several predefined actions to perform for each matching group. These actions are:

ignore()

ignore the group (i.e. leave everything as-is)

group-shallow(type)

group as child of the nearest common parent level of the group, using the specified type

group-deep(type)

group as direct child of the context node, using the specified type

group-custom-shallow(functionname)

group as child of the nearest common parent of the matching run, using the value returned by the custom UPL function functionname for the uci:type attribute. The evaluation context node for the custom function is the grouping element just created.

group-custom-deep(functionname)

group as direct child of the context node, using the value returned by the custom UPL function functionname for the uci:type attribute. The evaluation context node for the custom function is the grouping element just created.

delete-shallow()

delete the group's contents as if by doing a group-shallow(), then deleting that

delete-deep()

delete the group's contents as if by doing a group-deep(), then deleting that

replace-shallow(…xml source fragment…)

replaces the matching run's contents with the XML tree fragment parsed from the action's argument as if by doing a group-shallow(), the replacing that group as described

replace-deep(…xml source fragment…)

replaces the matching run's contents with the XML tree fragment parsed from the action's argument as if by doing a group-deep(), then replacing that group as described

replace-custom-shallow(functionname)

replaces the matching run's contents with the XML tree fragment parsed from the String returned from the custom UPL function named like the action's argument as if by doing a group-shallow(), then replacing that group as described

replace-custom-deep(functionname)

replaces the matching run's contents with the XML tree fragment parsed from the String returned from the custom UPL function named like the action's argument as if by doing a group-deep(), then replacing that group as described

For actions replace-custom-shallow() and replace-custom-deep(), the custom UPL function whose name is specified as parameter must have the following signature:

function functionname( $current-group as String, $position as Numeric, $groups as List) as String

where:

current-group

is the text of the current group, i.e. the group for which the function is called

position

is the number of the group, with 0 being the complete matched pattern, and a value between 1 and n (the number of defined groups in the pattern) the groups from left to right in the pattern

groups

is a List of Strings of all the matched groups in the pattern, with the first element being the complete match of the pattern, and indices 2 (first group) to n+1 (n being the number of defined groups in the pattern) their respective text contents. Note that due to List indices starting at 1, the following holds: $current-group = value-at( $groups, $position + 1 )

The function must return a (well-formed, if elements are contained) XML fragment serialized to a String. That returned value is then parsed into an internal XML tree representation and replaces the complete group (incl. its children) that would have been created at that position in the tree.

For actions other than replace-…(), on grouping, by default the function will wrap the character run to be grouped by either an uci:inline element (when currently at inline level) or a uci:block element (when currently at block level) with an uci:type attribute that has the value of the specified type id, unless you specified a different element name using the #set groupQName option.

The function returns true when there was at least one match for the regular expression, false otherwise.

Note on repeating groups

This function is backed by Java's regular expression support. As such, please note that you do not get a match for each occurrence of a repeating group, but only the last repetition that matched.

Example 5.89. 

With the context node

<uci:par>1.2.3.</uci:par>

and the call

markup-regex( "([0-9]\\.)*", { "ignore()", "group-shallow(levelnum)" } )

the result will be

<uci:par>1.2.<uci:inline uci:type="levelnum">3.</uci:inline></par>

The pattern group matches on "1." and "2." will not be grouped as only the last match of the repeating group is available from the regular expression engine.


We therefore recommend not to use repeating groups in patterns at all or – if you are aware of this limitation – only for groups whose associated action is "ignore()".

The workings of this function is best described with an example. Suppose we have the following XML fragment, the context node being the <p> element:

<p>[ID42] A paragraph with an <link>http://www.abc.de/<ins>new/ link</ins></link>.</p>

Let's have a look at some example applications of markup-regex():

Example 5.90. Marking up a textual ID

Let's say we want to markup the text ID42 with an element, at the same time discarding the surrounding square brackets. We'd write code like this:

markup-regex( "(\\[)(ID\\d+)(\\])",
    { "ignore()", "delete-shallow()", "group-shallow(myid)", "delete-shallow()" } );

We make three groups out of a matching string, (1) the leading square breacket, (2) the actual ID string, and (3) the trailing square bracket. We must therefore give four actions in the actions list, where the first action is the one to apply on the complete pattern match.

We do nothing with the complete pattern match, therefore we specify "ignore()". We want to delete the first matching pattern group (the square bracket), therefore we specify for the first group the action "delete-shallow()". Follows the ID group, when we want to markup using an inline element with type attribute value of myid. We therefore use "group-shallow(myid)". As with the first group, we want to delete it, so we again use "delete-shallow()".

The result after executing the above markup-regex() function will be:

<p><inline type="myid">ID42</inline> A paragraph with an <link>http://www.abc.de/<ins>new/ link</ins></link>.</p>

Example 5.91. Marking up the complete URL string (shallow)

Now, we want to markup just the full URL string, i.e. the text http://www.abc.de/new/. We'll use markup-regex() as follows:

markup-regex( "http[^\\s]+", { "group-shallow(url)" } );

As you see, we detect links by their prefix "http". We then make the simple assumption that a link ends at the first whitespace character following. Now, you'll see that the start and end of the character run we wish to markup will not form a well-formed subdivision in the XML fragment – so, what happens?

Here's a step-by-step schema of what's happening internally:

In Step 1, the complete sequence of text node children is considered for matching against the regular expression, and the character offsets for start and end points of a matching group is determined.

In Step 2, the splitting of nodes is performed. Since it is a shallow grouping action, the function will only split and cut up to the first common ancestor of both the matching group's start and end point, which in this example is the link element. It is important to see that for this splitting action, the ins element must be duplicated (which is exactly what happens internally). The algorithm tries to take care of any known ID type attributes on any split elements and remove them from the copy so that the document remains valid. All meta data attached to the node being split is also copied.

In Step 3, a new grouping element is created. This is either block (when splitting above paragraph level), or – as in our example – inline if the split is in mixed content (inline level). Both of these elements will reside in the upCast Internal namespace with the default prefix uci. The grouping element gets exactly one attribute, uci:type, which holds as value the type you specified in the action.

The resulting XML serialization would look like:

<p>[ID42] A paragraph with an <link><inline type="url">http://www.abc.de/<ins>new/</ins></inline><ins> link</ins></link>.</p>

Example 5.92. Marking up the complete URL string (deep)

Finally, we want to markup just the full URL string, i.e. the text http://www.abc.de/new/. We'll use markup-regex() as follows:

markup-regex( "http[^\\s]+", { "group-deep(url)" } );

The difference between this and the preceding example is that we now do a group-deep(), i.e. we want to make sure the grouping element is an immediate child of the content node. In this case, we need to cut through the tree until we reach the context node. Here's a step-by-step schema of what's happening internally:

In Step 1, the complete sequence of text node children is considered for matching against the regular expression, and the character offsets for start and end points of a matching group is determined.

In Step 2, the splitting of nodes is performed. Since it is a deep grouping action, the function will split and cut up the ancestor axis until the context node p is reached. It is important to see that for this splitting action, the ins and the link elements must be duplicated (which is exactly what happens internally).

In Step 3, a new grouping element is created as described.

The resulting XML serialization would look like:

<p>[ID42] A paragraph with an <inline type="url"><link>http://www.abc.de/<ins>new/</ins></link></inline><link><ins> link</ins></link>.</p>

Note

Note that in the UPL example code above, we needed to quote the backslash character because it is used in UPL as an escape character.

Example 5.93. Example: Marking up levels in a multi-level numbering string

Here's an example of how you might mark up the individual numbering levels in a multi-level numbering string where the separator character is the dot (.). Suppose we have the following context node:

<uci:par>2.4.3 Using markup-regex()</uci:par>

You may then markup the individual components of the numbering (Word supports nesting up to 9 levels deep, but we show only 5 here for clarity) using the following code:

markup-regex( "^(\\d+\\.?)(\\d+\\.?)?(\\d+\\.?)?(\\d+\\.?)?(\\d+\\.?)?", { "ignore()", "group-shallow(level1)", 
    "group-shallow(level2)", "group-shallow(level3)", "group-shallow(level4)", "group-shallow(level5)" } );

This will work for all numbering strings that contain at max 5 levels. It will work unchanged with any numbering strings that have less levels, as all groups except for the first one are optional.

The result of the above will be:

<uci:par><uci:inline uci:type="level1">2.</uci:inline><uci:inline uci:type="level2">4.</uci:inline><uci:inline uci:type="level3">3</uci:inline> Using markup-regex()</uci:par>

Example 5.94. Custom replace function: Keeping markup intact

Suppose the following XML:

<root>
  <p>fax 555-1234</p>
  <p>Fax 555-1234</p>
  <p>FAX 555-1234</p>
  <p><b>f</b>ax 555-1234</p>
</root>

We want to normalize the writing of "fax" to full uppercase, but keep any formatting intact. Note that in the last p, the 'f' of "fax" is bolded. We might use the following UPL code in a UPL Tree Processor:

[element(p)]{
  // normalize "Fax" et al. into "fax", keeping style properties intact
  markup-regex( "([fF])([aA])([xX])\\s", { "ignore()","replace-custom-shallow(replace-with-uppercase)", "replace-custom-shallow(replace-with-uppercase)", "replace-custom-shallow(replace-with-uppercase)" } );
}

function replace-with-uppercase( $current-group as String, $position as Numeric, $groups as List ) as String 
{
  return upper-case( $current-group );
}

This yields the result

<root>
  <p>FAX 555-1234</p>
  <p>FAX 555-1234</p>
  <p>FAX 555-1234</p>
  <p><b>F</b>AX 555-1234</p>
</root>

i.e. it normalizes "FAX", keeping its style properties (even of individual characters) within the word.


12.20. print

print(items as Value, ...) as Void

itemsValuevalue(s) to print to stdout

This function prints all its parameters to stdout (using System.out.print()).

For parameters other than String, the output will be as if passing the parameter to the to-string() function first.

12.21. println

println(items as Value, ...) as Void

itemsValuevalue(s) to print to stdout

This function prints all its parameters to stdout (using System.out.print(…)) and finally appends the newline character for the platform we are running on (using System.out.println()).

For parameters other than String, the output will be as if passing the parameter to the to-string() function first.

12.22. run-module

run-module(moduleID as Id, parameters as List) as Value

moduleIDIdmodule id (see below)
parametersListlist of module parameters

This method allows you to run a regular upCast pipeline module instance from anywhere within an UPL function or action part.

The moduleID parameter lets you set the module class you wish to run an instance of by ID. These are as specified in the upCast manual for the various modules:

  • pipelinevars

  • rtfimport

  • upl

  • uplcode

  • sectioner

  • grouper

  • xmlexport

  • css

  • commandline

  • unicodetranslator

  • validator

  • rtfexport

  • xslt

  • xmlimport

  • extpipeline

To set parameters on the created instance of the module, the function expects a List of Lists as its second parameter. Each element of the outer list represents one parameter, which itself is represented by an ordered, two-element list { name, value }.

The parameter list can be constructed programmatically, or specified as a constant directly in the call like in the example below.

The function returns the module's result value as delivered by the module in pipeline:ModuleResult. If the module chooses to not set that variable, the Numeric 0 (zero) is returned to indicate successful execution.

The function throws an EvalException when the executed module requests pipeline termination by signalling TERMINATE from its finalize function.

Example 5.95. 

Call the XSLT Processor module directly from within UPL code:

#namespace pipeline "http://www.infinity-loop.de/namespace/upcast-realm/pipeline";

run-module( xslt,
  {
    { "SourceFile", $pipeline:SourceFile },
    { "Stylesheet", $pipeline:PipelineBase + "/Resources/xslt/transformation.xsl" },
    { "DestinationFile", $pipeline:DestinationFolder + "/out.xml" },
    { "XSLTProcessor", "saxon" },
    { "StylesheetParameters", "debugmode=\"1\" rootelem=\"document\"" }
  }
);

12.23. set-grouping

set-grouping(groupingFlag as Bool) as Void

groupingFlagBoolboolean value indicating whether the context uci:part should group its contents (true) or not

This method, when applied to an uci:part element, determines whether that uci:part should group all contents up to the next occurrence of an uci:part, (groupingFlag set to true) or whether it should be just an empty element serving as position marker for the original section break in the imported RTF document (groupingFlag set to false).

The actual grouping is performed by a subsequent Sectioner module running on the internal tree.

12.24. set-heading-level

set-heading-level(level as Numeric) as Void

levelNumericheading level to set

Sets the uci:heading-level attribute on the context node (if it is an uci:par element; otherwise, this method does nothing). This information is used by a subsequent Sectioner module to create a corresponding uci:section element nesting.

12.25. set-process-children

set-process-children(doProcess as Bool) as Void

doProcessBoolboolean value

This method lets you set whether you want the context node's children be processed. When doProcess is true, the UPL processor will continue processing the context node's children. When it is false, processing will continue with the following (as per XPath use of this term) node.

This method is only useful when called on entering node processing state (as on leaving, children will already have been processed). The default value for each new context node is true, i.e. children will be processed.

12.26. set-rulemode

set-rulemode(ruleMode as String) as Void

ruleModeStringbreak | continue | jump:label

Sets the rule mode for this rule.

Each rule can decide on how to proceed when its actions have been executed. Normally, no further rules in the UPL program are considered, but the next node (resp. mode, when leave events are enabled and the execution mode changes) in document order is chosen and the UPL applied. However, you can also force to continue evaluating subsequent (in order specified) selectors, completely abort UPL processing or jump to a specific, labelled rule.

"break"

stop processing rules for the current context node and mode, and continue with the next mode or the next document node in document order. This is the default when not explicitly specified in a rule and when the rule mode has not been overridden using a #set defaultRuleMode directive, in which case the latter is used for th default.

"continue"

continue processing the rules list and execute the actions of the next matching rule (if that exists)

"exit"

same as "break", but also stops any further tree traversal, meaning that this was the last node in the tree for which UPL rule application has been performed

"jump:label"

like continue, but proceed processing on the rule prefixed by the label label.

The rule mode is reset to its default value (either the value specified using a #set defaultRuleMode directive or, if that is not specified, "break") whenever the context node or execution mode changes.

12.27. set-var

set-var(name as Id, value as Value) as Value

nameIdqualified variable name
valueValuevalue to set

Lets you set an UPL variable of name name to the new value value.

This function is useful when the name of the variable to set was calculated dynamically at runtime or retrieved from e.g. get-realm-value-names().

Note

You should always prefer the $varname := value notation except for those cases where the variable name is dynamically calculated at runtime.

Example 5.96. 

set-var( pipeline:SourceFile, "abc.txt" );

is equivalent to

$pipeline:SourceFile := "abc.txt";

If (and only if) the namespace of the name Id is one of the variable realm namespaces (and the realm is not a read-only realm), the value will be written to the respective realm variable. When the specified realm variable does not yet exist, it is created. You can test for the existence of a realm variable using exists-var().

This function performs a type coercion for realm variables according to the following table:

UPL Source type

Destination type

Value

java.lang.String

Bool

java.lang.Boolean

Numeric (dimensionless, e.g. 1 or 3.14)

java.lang.Double

Numeric (with dimension, e.g. 2in)

java.lang.String

(result of to-string())

Color

java.lang.String

(result of to-string(), format: "#rrggbb")

Id

java.lang.String

(result of to-string())

String

java.lang.String

List

java.util.List

(implementation: java.util.ArrayList)

with each list element again coerced according to this table based on its respective type

Null

null

Example 5.97. 

set-var( pipeline:srcFiles, { "file1.rtf", "file2.rtf" } )

will write a java.util.List object consisting of the two java.lang.String objects "file1.rtf" and "file2.rtf" to the variable srcFiles in the pipeline realm. When retrieved again using e.g. get-var(), it will be back-converted to a corresponding UPL List object.


12.28. single-listpar-level

single-listpar-level() as Bool

This method returns a value greater than zero if and only if the context node is an uci:par element and it is the only leaf element in a nested uci:list/uci:item structure.

Some authors use plain list numbering (instead of heading styles with outline level property) to create headings. When imported by the RTF Importer module, a structure similar to this one is created in this case:

<uci:list>
    <uci:item>
        <uci:par>Heading 1</uci:par>
    </uci:item>
</uci:list>

for what they want to be a heading at level 1. The above method will return 1 for this example when called with the uci:par node as context node, because that paragraph has no siblings and it is the only leaf within the uci:list/uci:item structure, and it is at the first uci:list nesting level.

You can use this for finding possible cases of thusly constructed "headings" and use that information in tandem with the flattening counterpart method, hoist-single-listpar().

12.29. stop

stop() as Void

This method immediately stops execution of the current pipeline module by throwing an EvalException.

12.30. stop

stop(scope as Id) as Void

scopeIdMODULE | PIPELINE

This method immediately stops execution of the current UPL program by throwing an EvalException. You can choose whether you only want to stop the execution of the currently running module, or the whole pipeline using the scope parameter:

MODULE

stops the currently running module by throwing an EvalException

PIPELINE

stops the currently running pipeline by throwing an EvalException and setting the internal cancel flag, i.e. to the application it looks like the user has additionally clicked the Cancel button, aborting any further pipeline execution.

12.31. system-exec

system-exec(executable as String, parameters as List, options as List) as Numeric

executableStringfull file path
parametersList
optionsListmap-like list of two-element (name, value)-pair lists

This function runs the executable with the given parameters.

The function returns the result code of the called executable when wait is true. When wait is false, the function immediately returns 0.

When the function times out, an Exception is thrown.

executable must be the absolute path to the executable.

parameters are the parameters to pass to the executable on the commandline, with each element being subjected to a to-string() call before being passed on.

options is a List of two-element {option as Id,value as Value} Lists. The following options are supported:

Option Name

Value Type

Default Value (when not specified)

Description

timeout

Numeric

300

timeout for the command in seconds, after which it is killed and an Exception is thrown

stdout-redirect

String

null

either an absolute file path to redirect stdout of the command to, or upcast:varname to write it to the pipeline variable varname

stderr-redirect

String

null

either an absolute file path to redirect stderr of the command to, or upcast:varname to write it to the pipeline variable varname

wait

Bool

true

when true, the call returns only when the command has terminated; otherwise, the call returns immediately and the command is executed unsynchronized in a parallel, separate thread

env

String

""

a command environment specification string using the same syntax as the one for the Commandline Processor

inactivity-timeout

Numeric

0

maximum time span (in seconds) with no command output on either stdout or stderr, after which the command is considered to be hanging/stalling and killed, throwing an Exception. When this is 0 (the default), command output is not monitored and this option is effectively disabled.

Example 5.98. 

system-exec( "/bin/ls", {"-A", "/Users/"}, 
    {
      { wait, true },
      { timeout, 5 },
      { stdout-redirect, "upcast:stdoutvar" },
    }
);

will (if successful) wait until the command finishes and then return 0 and it will write the command's standard output to the pipeline variable stdoutvar. After that call,

$pipeline:stdoutvar

might contain a String like

.localized
Shared
admin
chris
testuser

12.32. test-xpath

test-xpath(xpathExpression as String) as Bool

xpathExpressionStringXPath 1.0 expression

This function is an optimized shorthand for determining the effective boolean value of xpathExpression.

The following equivalence holds:

test-xpath( "expr" )

is equivalent to

eval-xpath( "(expr) and true()" ) cast as Bool

12.33. throw

throw(exceptionType as Id) as Null

exceptionTypeIdException | EvalException | UserDefinedException | TypeConversionException

Throws an exception of the type set in exceptionType.

12.34. unique-timestamp

unique-timestamp() as String

This method creates a unique time stamp (within this Java Virtual Machine instance) of the form n…n-nnnn, with n being a decimal digit.

The first part of the time stamp identifier is determined by calling System.currentTimeMillis(), the second, four-digit part is generated by a ring counter incremented for each call to this method to make up for a lower, system dependent millisecond resolution.

12.35. unmangle-string

unmangle-string(mangledString as String) as List

mangledStringStringan upCast RT mangled string

Some internally generated attributes that can have a dynamically varying number of components are mangled in a proprietary way before storing them as a textual value in an element attribute. This method lets you convert that mangled string into a List that can be used for further processing in UPL.

mangledString is the mangled string that should be converted into a List.

When applying this method to a string value that not actually is mangled, it will return a one-element List with that value as its sole element.

12.36. wl-convert-doc-to-rtf

wl-convert-doc-to-rtf(sourcedoc as String, destrtf as String, command as String, timeout as Numeric) as Numeric

sourcedocStringabsolute file path
destrtfStringabsolute file path
commandStringPages || Update || Premacro || Lines || Includelinkedimages || Updatelinks
timeoutNumerictimeout in ms

This function uses the WordLink component to convert a Word binary (.doc) file to the equivalent RTF file.

The source Word binary file's absolute path must be specified in sourcedoc.

The desired RTF result file's absolute path must be specified in destrtf.

You can specify the WordLink command string to be additionally executed using the command parameter. Available commands are (case-sensitive!):

Pages

create page marker bookmarks according to the current document formatting into pages

Update

updates any contained fields

Premacro

runs the VB macro named il_premacro (if defined)

Lines

create line marker bookmarks according to the current document formatting into lines

Includelinkedimages

images only linked to the document will be converted to embedded images

Updatelinks

hypelinks will be updated/re-created

Concatenate the commands without any whitespace inbetween in the desired order of execution before exporting to RTF.

In timeout, you can specify a maximum timeout in seconds the conversion is allowed to take. After that time, the command is aborted and false is returned.

The function returns 0 when the conversion to RTF was performed successfully. Otherwise, a negative error code is returned (e.g. when running on a system where WordLink is not available or no Word application is installed; see also get-environment-value() with key wordlink-wordversion).

Example 5.99. 

wl-convert-doc-to-rtf( "/word/doc/test.doc", "/word/doc/converted.rtf", "UpdateIncludelinkedimagesPages", 60 );

will try to convert test.doc to converted.rtf, first updating fields, including linked images and marking pagebreaks, with a timeout of 60 seconds.


12.37. wl-convert-doc-to-rtf

wl-convert-doc-to-rtf(sourcedoc as String, destrtf as String, command as String) as Numeric

sourcedocStringabsolute file path
destrtfStringabsolute file path
commandStringPages || Update || Premacro || Lines || Includelinkedimages || Updatelinks

This function uses the WordLink component to convert a Word binary (.doc) file to the equivalent RTF file.

The source Word binary file's absolute path must be specified in sourcedoc.

The desired RTF result file's absolute path must be specified in destrtf.

You can specify the WordLink command string to be additionally executed using the command parameter. Available commands are (case-sensitive!):

Pages

create page marker bookmarks according to the current document formatting into pages

Update

updates any contained fields

Premacro

runs the VB macro named il_premacro (if defined)

Lines

create line marker bookmarks according to the current document formatting into lines

Includelinkedimages

images only linked to the document will be converted to embedded images

Updatelinks

hypelinks will be updated/re-created

Concatenate the commands without any whitespace inbetween in the desired order of execution before exporting to RTF.

The (default) timeout used for the command is 300 seconds (= 5 minutes).

The function returns 0 when the conversion to RTF was performed successfully. Otherwise, a negative error code is returned (e.g. when running on a system where WordLink is not available or no Word application is installed; see also get-environment-value() with key wordlink-wordversion).

Example 5.100. 

wl-convert-doc-to-rtf( "/word/doc/test.doc", "/word/doc/converted.rtf", "UpdateIncludelinkedimagesPages" );

will try to convert test.doc to converted.rtf, first updating fields, including linked images and marking pagebreaks, with a timeout of 5 minutes (=the default timeout).


12.38. wl-convert-doc-to-rtf

wl-convert-doc-to-rtf(sourcedoc as String, destrtf as String) as Numeric

sourcedocStringabsolute file path
destrtfStringabsolute file path

This function uses the WordLink component to convert a Word binary (.doc) file to the equivalent RTF file.

The source Word binary file's absolute path must be specified in sourcedoc.

The desired RTF result file's absolute path must be specified in destrtf.

The (default) timeout used for the command is 300 seconds (= 5 minutes).

The function returns 0 when the conversion to RTF was performed successfully. Otherwise, a negative error code is returned (e.g. when running on a system where WordLink is not available or no Word application is installed; see also get-environment-value() with key wordlink-wordversion).

Example 5.101. 

wl-convert-doc-to-rtf( "/word/doc/test.doc", "/word/doc/converted.rtf", "UpdateIncludelinkedimagesPages" );

will try to convert test.doc to converted.rtf with a timeout of 5 minutes (=the default timeout).


12.39. wl-convert-rtf-to-doc

wl-convert-rtf-to-doc(sourcertf as String, destdoc as String, timeout as Numeric) as Numeric

sourcertfStringabsolute file path
destdocStringabsolute file path
timeoutNumerictimeout in ms

This function uses the WordLink component to convert a RTF file to a Word binary (.doc) file.

The source RTF file's absolute path must be specified in sourcertf.

The desired Word binary result file's absolute path must be specified in destdoc.

In timeout, you can specify a maximum timeout in seconds the conversion is allowed to take. After that time, the command is aborted and false is returned.

The function returns 0 when the conversion was performed successfully. Otherwise, a negative error code is returned (e.g. when running on a system where WordLink is not available or no Word application is installed; see also get-environment-value() with key wordlink-wordversion).

Example 5.102. 

wl-convert-rtf-to-doc( "/word/doc/test.rtf", "/word/doc/converted.doc", 60 );

will try to convert test.rtf to converted.doc with a timeout of 60 seconds.


12.40. wl-convert-rtf-to-doc

wl-convert-rtf-to-doc(sourcertf as String, destdoc as String) as Numeric

sourcertfStringabsolute file path
destdocStringabsolute file path

This function uses the WordLink component to convert a RTF file to a Word binary (.doc) file.

The source RTF file's absolute path must be specified in sourcertf.

The desired Word binary result file's absolute path must be specified in destdoc.

The (default) timeout used for the command is 300 seconds (= 5 minutes).

The function returns 0 when the conversion was performed successfully. Otherwise, a negative error code is returned (e.g. when running on a system where WordLink is not available or no Word application is installed; see also get-environment-value() with key wordlink-wordversion).

Example 5.103. 

wl-convert-rtf-to-doc( "/word/doc/test.rtf", "/word/doc/converted.doc" );

will try to convert test.rtf to converted.doc with a default timeout of 300 seconds.


12.41. wl-run-command

wl-run-command(command as String, parameters as List, timeout as Numeric) as Numeric

commandStringcommand (see below)
parametersListlist of parameter values for the respective command
timeoutNumerictimeout in ms

Runs a low-level WordLink command.

In command, you pass the main command you want to be run, and in parameters you pass the parameters for that command.

With the (optional) timeout parameter, you specify the timeout in seconds for the command, after which control will unconditionally return to upCast and the WordLink command will be killed. If you specify -1 for the timeout parameter, upCast's default timeout will be used which currently is 300 seconds.

The following commands are supported (capitalization is significant):

test

Shows a dialog box with version information of the WordLink component.

This command has no parameters.

The return value is always 0.

version

This command has no parameters.

Returns the internal version number of the WordLink component.

testdll

** Deprecated ** (the DLL upCast interface is no longer supported!)

Displays the result of the DLL availability test in a GUI dialog.

This command has no parameters.

Returns the DLL version.

versionDLL

** Deprecated ** (the DLL upCast interface is no longer supported!)

This command has no parameters.

Returns the DLL version.

testForWord

This command has no parameters.

Returns 0 if Word is installed and accessable as COM object on this machine, -1 otherwise.

wordVersion

This command has no parameters.

Returns the Word version installed as an integer number, or a negative error code if the version could not be determined.

doc2rtf

Saves a Word document to RTF format (the format required by upCast).

Several additional sub-functionalities can be requested by appending one or more sub-commands to the main command keyword, where the writing order of the sub-commands governs the execution order:

Pages

inserts specially named bookmarks into the document at all current dynamic page break positions (top of the page). These bookmarks are then automatically transformed into <pagestart /> elements by the RTF Importer module.

Lines

inserts specially named bookmarks into the document at all current dynamic line break positions (start of the line). These bookmarks are then automatically transformed into <linestart /> elements by the RTF Importer module.

Premacro

runs the macro named il_premacro on the document if it exists

Update

updates all fields in the Word document

Includelinkedimages

turns all linked images into embedded images

Updatelinks

updates all linked objects in the Word document

This command takes the following two parameters:

  1. the absolute path in local Windows filepath notation to the source file to convert to RTF and apply the sub-commands on (if specified)

  2. the absolute path in local Windows filepath notation to the destination file where the converted and pre-processed document should be written to

Returns 0 if the operation succeeded, or a negative error code otherwise.

rtf2doc

Saves an RTF file to Word binary (.doc) format.

This command takes the following two parameters:

  1. the absolute path in local Windows filepath notation to the source file to convert to .doc

  2. the absolute path in local Windows filepath notation to the destination file where the converted file should be written to

Returns 0 if the operation succeeded, or a negative error code otherwise.

rtf2docupdatedfields

Same as rtf2doc, but additionally performs an update for any contained fields before saving to Word binary (.doc) format.

This command takes the following two parameters:

  1. the absolute path in local Windows filepath notation to the source file to convert to .doc

  2. the absolute path in local Windows filepath notation to the destination file where the converted file should be written to

Returns 0 if the operation succeeded, or a negative error code otherwise.

rtf2rtf

Saves a source RTF file to a destination RTF file after performing an update for any contained field.

This command takes the following two parameters:

  1. the absolute path in local Windows filepath notation to the source RTF file

  2. the absolute path in local Windows filepath notation to the destination RTF file to write the result of the field update to

Returns 0 if the operation succeeded, or a negative error code otherwise.

callMacro

Opens the specified document and runs the macro with the specified name on it.

The document is not automatically saved after having run the macro. If you want or need this, you should include a document saving step in your macro implementation.

This command takes the following two parameters:

  1. the absolute path in local Windows filepath notation to the source Word file

  2. the name of the macro to call.
Note: The macro name must not contain a question mark (?) or forward slash (/) character.

Returns 0 if the operation succeeded, or a negative error code otherwise.

Example 5.104. Update fields in RTF Exporter result

To update the field contents of an RTF Exporter-generated RTF file (whose absolute local path name is stored in the variable $resultFile), you could use the following code:

wl-run-command( "rtf2rtf", { $resultFile, get-path-component( $resultFile, LOCALBASENAMEPATH ) + "-updatedfields.rtf" }, 120 );

The command will be killed if it takes longer than two minutes.


Example 5.105. Convert .doc file to RTF with marking up dynamic page and line breaks

To convert a Word binary (.doc) file to RTF suitable for processing with upCast's RTF Importer module, with also identifying the positions of current dynamic line- and pagebreaks in the result tree with pagestart and linestart elements, use the following code:

wl-run-command( "doc2rtfPagesLines", { $pipeline:SourceFile, get-path-component( $pipelineSourceFile, LOCALBASENAMEPATH ) + "-preprocessed.rtf" }, -1 );

The command will be aborted after upCast's default WordLink timeout.


12.42. wl-run-command

wl-run-command(command as String, parameters as List) as Numeric

commandStringcommand (see below)
parametersListlist of parameter values for the respective command

Runs a low-level WordLink command.

In command, you pass the main command you want to be run, and in parameters you pass the parameters for that command.

The command runs with upCast's default timeout of 300 seconds.

For details on the available commands and their parameters, see wl-run-command().

Note

wl-run-command( command, parameters );

is equivalent to

wl-run-command( command, parameters, -1 );

12.43. xslt

xslt(sourcefile as String, xslt as String, parameters as List, destfile as String, processor as Id) as

sourcefileString
xsltStringthe transformation sheet
parametersListlist of two-element lists, representing a map; may be null
destfileStringabsolute URL or one of: "upcast:<realm:name>", "-"
processorIdXALAN | SAXON6 | SAXON

This function lets you perform an XSLT transform right from within UPL (without having to resort to a run-module() call using the XSLT processor module).

sourcefile specifies the full path to the XML source file to process.

xslt specifies the full path to the XSLT transformation to apply to sourcefile.

parameters is a List of two-element Lists (key/value pairs) specifying the parameters and their values to be passed to the transform.

destfile specifies the full path to the file where the result of the transform should be written to.

Note

The use of XSLT 2's <xsl:result-document> element may yield additional result files from a single transformation chain.

destfile can take the following two special values to redirect the transformation's main output to either a pipeline variable or have it returned as String directly from this function.

"upcast:pipeline:varname"

redirects the transformation's main output to the variable varname in the pipeline realm.

"-"

redirects the transformation's main output to a String which is then returned as the function's return value.

processor lets you specify which XSLT processor to use:

XALAN

use Apache's XALAN processor supporting XSLT 1

SAXON6

use Michael Kay's Saxon XSLT processor supporting XSLT 1

SAXON

use Saxonica's Saxon-B 9.1.x XSLT processor supporting XSLT 2

The function returns null, except for when the destfile parameter is "-", in which case the function returns the transformation's main output as String.

Example 5.106. 

xslt( "src.xml", "transform.xsl", { { date, current-dateTime() } }, "out.xml", XALAN )

will transform src.xml using transform.xsl (supplying the current date and time as parameter "date"), and write the result to out.xml. It will use the XALAN XSLT processor.

xslt( "src.xml", "transform.xsl", {}, "-", SAXON6 )

will transform src.xml using transform.xsl and return the result as String. It will use the Saxon 6 XSLT processor.

xslt( "src.xml", "transform.xsl", {}, "upcast:pipeline:transform-result", SAXON )

will transform src.xml using transform.xsl and store the result as String in the variable transform-result in the pipeline realm, which you then could retrieve using $pipeline:transform-result (from UPL) or ${pipeline:transform-result} (within a pipeline's UI field). It will use the Saxon 9.1.x XSLT processor.


12.44. xslt

xslt(sourcefile as String, xslts as List, parameters as List, destfile as String, processor as Id) as

sourcefileString
xsltsListlist of the transformation sheets to chain
parametersListlist of two-element lists, representing a map; may be null
destfileString
processorIdXALAN | SAXON6 | SAXON

This function lets you perform a chain of XSLT transforms right from within UPL (without having to resort to a run-module() call using the XSLT processor module).

sourcefile specifies the full path to the XML source file to process.

xslts specifies (as an ordered List of Strings) the full paths to the XSLT transformations to apply in sequence (i.e., pipelined) to sourcefile.

parameters is a List of two-element Lists (key/value pairs) specifying the parameters and their values to be passed to each of the transforms.

destfile specifies the full path to the file where the result of the transform should be written to.

Note

The use of XSLT 2's <xsl:result-document> element may yield additional result files from a single transformation chain.

destfile can take the following two special values to redirect the transformation's main output to either a pipeline variable or have it returned as String directly from this function.

"upcast:pipeline:varname"

redirects the transformation's main output to the variable varname in the pipeline realm.

"-"

redirects the transformation's main output to a String which is then returned as the function's return value.

processor lets you specify which XSLT processor to use:

XALAN

use Apache's XALAN processor supporting XSLT 1

SAXON6

use Michael Kay's Saxon XSLT processor supporting XSLT 1

SAXON

use Saxonica's Saxon-B 9.1.x XSLT processor supporting XSLT 2

The function returns null, except for when the destfile parameter is "-", in which case the function returns the transformation's main output as String.

Example 5.107. 

xslt( "src.xml", {"t1.xsl","t2.xsl"}, { { date, current-dateTime() } }, "out.xml", SAXON )

will transform src.xml using t1.xsl to an intermediary result (stored in memory), to which t2.xsl will be subsequently applied to get the final result, supplying the current date and time as parameter "date" to each transformation step. The result will be written to out.xml. For each transform, the Saxon-B XSLT 2 processor is used.

For further examples, also see xslt().


13. Functions for working with styles

13.1. %

%(condition as BoolExpression) as Numeric

conditionBoolExpressionboolean expression to evaluate in target context

This method calculates the percentage of descendant text characters for which the effective boolean value of the expression when evaluated on that character is true.

Conceptually, each descendant character is treated as if it was a context node and the expression is applied. If its effective boolean value is true, that character is flagged as fulfilling the condition. After all descendant characters of the context node have been flagged this way, the percentage of characters that are flagged as fulfilling the condition from all descendant characters is returned.

Example 5.108. 

Suppose we have the following XML fragment (indented for legibility):

<par style="font-weight: normal; font-size: 12pt">
  <inline style="font-size: 18pt">
    <inline style="font-weight: bold">Creation</inline>
    vs.
    <inline style="font-weight: bold">Destruction?</inline>
  </inline>
</par>

The context node is par. The following UPL function call

%(@css:font-weight="bold" and @css:font-size > 16pt)

will evaluate to the value (8 + 12) / 25 = 0.76.

The text content of the context node is (25 characters):

Creation vs. Destruction?
xxxxxxxx.....xxxxxxxxxxxx

with the characters marked x (8 and 12) fulfilling the condition by being typeset in bold and having a font size greater than 16pt .

You can use the above e.g. to find headings based on the assumption that at least 75% of their characters will be bold and have a font size greater than 16pt. So you could use the above in a rule like the following:

[element(par) and %(@css:font-weight="bold" and @css:font-size > 16pt) >= 0.75]
{
  set-heading-level(1); /* make this a heading of level 1 */
} 

13.2. markup-style

markup-style(condition as BoolExpression, matchAction as String) as Bool

conditionBoolExpressionboolean expression to evaluate with the actual context later
matchActionStringaction to perform when a match occurs as string

This function evaluates the style condition condition on the plain character content of all descendants of the context node, optionally creating markup over contiguous runs of matching characters according to the matchAction, which can have the following values:

ignore()

ignore the group (i.e. leave everything as-is)

group-shallow(type)

group as child of the nearest common parent of the matching run, using the specified type

group-deep(type)

group as direct child of the context node, using the specified type

group-custom-shallow(functionname)

group as child of the nearest common parent of the matching run, using the value returned by the custom UPL function functionname for the uci:type attribute. The evaluation context node for the custom function is the grouping element just created.

group-custom-deep(functionname)

group as direct child of the context node, using the value returned by the custom UPL function functionname for the uci:type attribute. The evaluation context node for the custom function is the grouping element just created.

delete-shallow()

delete the matching run's contents as if by doing a group-shallow(), then deleting that

delete-deep()

delete the matching run's contents as if by doing a group-deep(), then deleting that

replace-shallow(…xml source fragment…)

replaces the matching run's contents with the XML tree fragment parsed from the action's argument as if by doing a group-shallow(), then replacing that group as described

replace-deep(…xml source fragment…)

replaces the matching run's contents with the XML tree fragment parsed from the action's argument as if by doing a group-deep(), then replacing that group as described

replace-custom-shallow(functionname)

replaces the matching run's contents with the XML tree fragment parsed from the String returned from the custom UPL function named like the action's argument as if by doing a group-shallow(), then replacing that group as described

replace-custom-deep(functionname)

replaces the matching run's contents with the XML tree fragment parsed from the String returned from the custom UPL function named like the action's argument as if by doing a group-deep(), then replacing that group as described

For actions replace-custom-shallow() and replace-custom-deep(), the custom UPL function whose name is specified as parameter must have the following signature:

function functionname( $current-run as String, $position as Numeric, $runs as List) as String

where:

current-run

is the text of the current run, i.e. the text run for which the function is called

position

This is always 0, since matched runs cannot nest. This parameter is only there for compatibility reasons of this function with functions written for markup-regex().

runs

is a List that contains as its sole element the current text run as String. This parameter is only there for compatibility reasons of this function with functions written for markup-regex().

The function must return a (well-formed, if elements are contained) XML fragment serialized to a String. That returned value is then parsed into an internal XML tree representation and replaces the complete group (incl. its children) that would have been created at that position in the tree.

For actions other than replace-…(), on grouping, the function will wrap the character run to be grouped by either an uci:inline element (when currently at inline level) or a uci:block element (when currently at block level) with an uci:type attribute that has the value of the specified type id, unless you specified a different element name for grouping using the #set groupQName option.

This function works very similar to markup-regex(), except that not a regular expression, but a boolean expression on character properties is used for determining the individual groups of contiguous runs of characters (with respect to the result of the boolean expression calculated for each of them) to mark up.

The properties in the condition parameter that can be used are restricted to CSS style attributes only. This means you can only use (synthesized) attributes from upCast's css, cssc and csso namespaces. You cannot query regular attributes (like e.g. uci:diffStyle), since those real attributes are not inherited to text nodes (which cannot have attributes in the first place).

Therefore, the working of the actions is identical, just the mechanism for determining the groups resp. text runs is different. For a graphic of the result of the various actions, please see the examples for markup-regex().

The function returns true when there was at least one match for the regular expression, false otherwise.

13.3. markup-style

markup-style(condition as BoolExpression, matchAction as String, nonmatchAction as String) as Void

conditionBoolExpressionboolean expression to evaluate with the actual context later
matchActionStringaction to perform when a match occurs (as string)
nonmatchActionStringaction to perform when no match occurs (as string)

This function evaluates the style condition condition on the plain character content of all descendants of the context node, optionally creating markup over contiguous runs of matching characters according to the matchAction (for characters matching the condition) and nonmatchAction (for characters not matching the condition) which can have the following values:

ignore()

ignore the group (i.e. leave everything as-is)

group-shallow(type)

group as child of the nearest common parent of the matching run, using the specified type

group-deep(type)

group as direct child of the context node, using the specified type

group-custom-shallow(functionname)

group as child of the nearest common parent of the matching run, using the value returned by the custom UPL function functionname for the uci:type attribute. The evaluation context node for the custom function is the grouping element just created.

group-custom-deep(functionname)

group as direct child of the context node, using the value returned by the custom UPL function functionname for the uci:type attribute. The evaluation context node for the custom function is the grouping element just created.

delete-shallow()

delete the matching run's contents as if by doing a group-shallow(), then deleting that

delete-deep()

delete the matching run's contents as if by doing a group-deep(), then deleting that

replace-shallow(…xml source fragment…)

replaces the matching run's contents with the XML tree fragment parsed from the action's argument as if by doing a group-shallow(), then replacing that group as described

replace-deep(…xml source fragment…)

replaces the matching run's contents with the XML tree fragment parsed from the action's argument as if by doing a group-deep(), then replacing that group as described

replace-custom-shallow(functionname)

replaces the matching run's contents with the XML tree fragment parsed from the String returned from the custom UPL function named like the action's argument as if by doing a group-shallow(), then replacing that group as described

replace-custom-deep(functionname)

replaces the matching run's contents with the XML tree fragment parsed from the String returned from the custom UPL function named like the action's argument as if by doing a group-deep(), then replacing that group as described

For actions replace-custom-shallow() and replace-custom-deep(), the custom UPL function whose name is specified as parameter must have the following signature:

function functionname( $current-run as String, $position as Numeric, $runs as List) as String

where:

current-run

is the text of the current run, i.e. the text run for which the function is called

position

This is always 0, since matched runs cannot nest. This parameter is only there for compatibility reasons of this function with functions written for markup-regex().

runs

is a List that contains as its sole element the current text run as String. This parameter is only there for compatibility reasons of this function with functions written for markup-regex().

The function must return a (well-formed, if elements are contained) XML fragment serialized to a String. That returned value is then parsed into an internal XML tree representation and replaces the complete group (incl. its children) that would have been created at that position in the tree.

For actions other than replace-…(), on grouping, the function will wrap the character run to be grouped by either an uci:inline element (when currently at inline level) or a uci:block element (when currently at block level) with an uci:type attribute that has the value of the specified type id, unless you specified a different element name using the #set groupQName option.

This function works like markup-style(), except that you can also specify an action for characters not matching the boolean expression. It effectively partitions the complete CDATA content of the context node into contiguous groups of character runs that either match or do not match the expression.

The properties in the condition parameter that can be used are restricted to CSS style attributes only. This means you can only use (synthesized) attributes from upCast's css, cssc and csso namespaces. You cannot query regular attributes (like e.g. uci:diffStyle), since those real attributes are not inherited to text nodes (which cannot have attributes in the first place).

The following equivalence holds:

match-style( expr, matchAction )

has the same effect as if writing

match-style( expr, matchAction, "ignore()" )

The function returns true when there was at least one match for the regular expression, false otherwise.

13.4. markup-style-with-roots

markup-style-with-roots(markuproots as BoolExpression, condition as BoolExpression, matchAction as String) as Bool

markuprootsBoolExpressionboolean expression evaluating to true for all desired descendant markup root elements
conditionBoolExpressionboolean expression to evaluate with the actual context later
matchActionStringaction to perform when a match occurs as string

This function evaluates the style condition condition on the plain character content of all descendants of the context node, optionally creating markup over contiguous runs of matching characters according to the matchAction.

The parameter markuproots is a boolean expression that must evaluate to true for exactly those nodes that this function should consider additional markup roots (besides the context node it is called on which is always – and automatically – treated as a markup root).

A markup root is a node that will never get split by a markup action and which acts as the border for the …-deep() variants of actions.

The parameter matchAction can have the following values:

ignore()

ignore the group (i.e. leave everything as-is)

group-shallow(type)

group as child of the nearest common parent or markup root (whichever is nearer to the leaf text node) of the matching run, using the specified type

group-deep(type)

group as direct child of the nearest markup root, using the specified type

group-custom-shallow(functionname)

group as child of the nearest common parent of the matching run, using the value returned by the custom UPL function functionname for the uci:type attribute. The evaluation context node for the custom function is the grouping element just created.

group-custom-deep(functionname)

group as direct child of the context node, using the value returned by the custom UPL function functionname for the uci:type attribute. The evaluation context node for the custom function is the grouping element just created.

delete-shallow()

delete the matching run's contents as if by doing a group-shallow(), then deleting that

delete-deep()

delete the matching run's contents as if by doing a group-deep(), then deleting that

replace-shallow(…xml source fragment…)

replaces the matching run's contents with the XML tree fragment parsed from the action's argument as if by doing a group-shallow(), then replacing that group as described

replace-deep(…xml source fragment…)

replaces the matching run's contents with the XML tree fragment parsed from the action's argument as if by doing a group-deep(), then replacing that group as described

replace-custom-shallow(functionname)

replaces the matching run's contents with the XML tree fragment parsed from the String returned from the custom UPL function named like the action's argument as if by doing a group-shallow(), then replacing that group as described

replace-custom-deep(functionname)

replaces the matching run's contents with the XML tree fragment parsed from the String returned from the custom UPL function named like the action's argument as if by doing a group-deep(), then replacing that group as described

Example 5.109. Example for the use of more than one markup root

Suppose we have the following paragraph, with the underlined portion being a hyperlink:

We now want to markup all red text in the paragraph. Normally, we would use markup-style() for this as follows:

markup-style( @css:color = red, "group-deep(red)" ); // [1]

However, by grouping deep, this would yield the following conceptual XML:

<p>Some <link>linked </link><red><link>red</link> text</red> here.</p>

As you see, the link element was split to accommodate for the request to group deep on the color red. This often is a problem in real documents as there are now two consecutive link elements pointing to the same target, which is undesirable. What do we need to do to ensure the link element is never split? We define it as an additional markup root by writing:

markup-style-with-roots( element(link), @css:color = red, "group-deep(red)" ); // [2]

The conceptual XML this call results in now is as follows:

<p>Some <link>linked <red>red</red></link><red> text</red> here.</p>

Technically, the implementation performs the markup-style() operation not (necessarily) on the context node where it is called, but on the nearest ancestor markup root node of each group of text nodes having that same markup root.


For actions replace-custom-shallow() and replace-custom-deep(), the custom UPL function whose name is specified as parameter must have the following signature:

function functionname( $current-run as String, $position as Numeric, $runs as List) as String

where:

current-run

is the text of the current run, i.e. the text run for which the function is called

position

This is always 0, since matched runs cannot nest. This parameter is only there for compatibility reasons of this function with functions written for markup-regex().

runs

is a List that contains as its sole element the current text run as String. This parameter is only there for compatibility reasons of this function with functions written for markup-regex().

The function must return a (well-formed, if elements are contained) XML fragment serialized to a String. That returned value is then parsed into an internal XML tree representation and replaces the complete group (incl. its children) that would have been created at that position in the tree.

For actions other than replace-…(), on grouping, the function will wrap the character run to be grouped by either an uci:inline element (when currently at inline level) or a uci:block element (when currently at block level) with an uci:type attribute that has the value of the specified type id, unless you specified a different element name using the #set groupQName option.

This function works very similar to markup-regex(), except that not a regular expression, but a boolean expression on character properties is used for determining the individual groups of contiguous runs of characters (with respect to the result of the boolean expression calculated for each of them) to mark up.

The properties in the condition parameter that can be used are restricted to CSS style attributes only. This means you can only use (synthesized) attributes from upCast's css, cssc and csso namespaces. You cannot query regular attributes (like e.g. uci:diffStyle), since those real attributes are not inherited to text nodes (which cannot have attributes in the first place).

Therefore, the working of the actions is identical, just the mechanism for determining the groups resp. text runs is different. For a graphic of the result of the various actions, please see the examples for markup-regex().

The function returns true when there was at least one match for the regular expression, false otherwise.

13.5. markup-style-with-roots

markup-style-with-roots(markuproots as BoolExpression, condition as BoolExpression, matchAction as String, nonmatchAction as String) as Bool

markuprootsBoolExpressionboolean expression evaluating to true for all desired descendant markup root elements
conditionBoolExpressionboolean expression to evaluate with the actual context later
matchActionStringaction to perform when a match occurs
nonmatchActionStringaction to perform when there's no match

This function evaluates the style condition condition on the plain character content of all descendants of the context node, optionally creating markup over contiguous runs of matching characters according to the matchAction (for characters matching the condition) and nonmatchAction (for characters not matching the condition).

The parameter markuproots is a boolean expression that must evaluate to true for exactly those nodes that this function should consider additional markup roots (besides the context node it is called on which is always – and automatically – treated as a markup root).

A markup root is a node that will never get split by a markup action and which acts as the border for the …-deep() variants of actions.

The parameter matchAction can have the following values:

ignore()

ignore the group (i.e. leave everything as-is)

group-shallow(type)

group as child of the nearest common parent or markup root (whichever is nearer to the leaf text node) of the matching run, using the specified type

group-deep(type)

group as direct child of the nearest markup root, using the specified type

group-custom-shallow(functionname)

group as child of the nearest common parent of the matching run, using the value returned by the custom UPL function functionname for the uci:type attribute. The evaluation context node for the custom function is the grouping element just created.

group-custom-deep(functionname)

group as direct child of the context node, using the value returned by the custom UPL function functionname for the uci:type attribute. The evaluation context node for the custom function is the grouping element just created.

delete-shallow()

delete the matching run's contents as if by doing a group-shallow(), then deleting that

delete-deep()

delete the matching run's contents as if by doing a group-deep(), then deleting that

replace-shallow(…xml source fragment…)

replaces the matching run's contents with the XML tree fragment parsed from the action's argument as if by doing a group-shallow(), then replacing that group as described

replace-deep(…xml source fragment…)

replaces the matching run's contents with the XML tree fragment parsed from the action's argument as if by doing a group-deep(), then replacing that group as described

replace-custom-shallow(functionname)

replaces the matching run's contents with the XML tree fragment parsed from the String returned from the custom UPL function named like the action's argument as if by doing a group-shallow(), then replacing that group as described

replace-custom-deep(functionname)

replaces the matching run's contents with the XML tree fragment parsed from the String returned from the custom UPL function named like the action's argument as if by doing a group-deep(), then replacing that group as described

Example 5.110. Example for the use of more than one markup root

Suppose we have the following paragraph, with the underlined portion being a hyperlink:

We now want to markup all red text in the paragraph. Normally, we would use markup-style() for this as follows:

markup-style( @css:color = red, "group-deep(red)", "ignore()" ); // [1]

However, by grouping deep, this would yield the following conceptual XML:

<p>Some <link>linked </link><red><link>red</link> text</red> here.</p>

As you see, the link element was split to accommodate for the request to group deep on the color red. This often is a problem in real documents as there are now two consecutive link elements pointing to the same target, which is undesirable. What do we need to do to ensure the link element is never split? We define it as an additional markup root by writing:

markup-style-with-roots( element(link), @css:color = red, "group-deep(red)", "ignore()" ); // [2]

The conceptual XML this call results in now is as follows:

<p>Some <link>linked <red>red</red></link><red> text</red> here.</p>

Technically, the implementation performs the markup-style() operation not (necessarily) on the context node where it is called, but on the nearest ancestor markup root node of each group of text nodes having that same markup root.


For actions replace-custom-shallow() and replace-custom-deep(), the custom UPL function whose name is specified as parameter must have the following signature:

function functionname( $current-run as String, $position as Numeric, $runs as List) as String

where:

current-run

is the text of the current run, i.e. the text run for which the function is called

position

This is always 0, since matched runs cannot nest. This parameter is only there for compatibility reasons of this function with functions written for markup-regex().

runs

is a List that contains as its sole element the current text run as String. This parameter is only there for compatibility reasons of this function with functions written for markup-regex().

The function must return a (well-formed, if elements are contained) XML fragment serialized to a String. That returned value is then parsed into an internal XML tree representation and replaces the complete group (incl. its children) that would have been created at that position in the tree.

For actions other than replace-…(), on grouping, the function will wrap the character run to be grouped by either an uci:inline element (when currently at inline level) or a uci:block element (when currently at block level) with an uci:type attribute that has the value of the specified type id, unless you specified a different element name using the #set groupQName option.

This function works very similar to markup-regex(), except that not a regular expression, but a boolean expression on character properties is used for determining the individual groups of contiguous runs of characters (with respect to the result of the boolean expression calculated for each of them) to mark up.

The properties in the condition parameter that can be used are restricted to CSS style attributes only. This means you can only use (synthesized) attributes from upCast's css, cssc and csso namespaces. You cannot query regular attributes (like e.g. uci:diffStyle), since those real attributes are not inherited to text nodes (which cannot have attributes in the first place).

Therefore, the working of the actions is identical, just the mechanism for determining the groups resp. text runs is different. For a graphic of the result of the various actions, please see the examples for markup-regex().

The function returns true when there was at least one match for the regular expression, false otherwise.

14. String Functions

14.1. codepoints-to-string

codepoints-to-string(codepoints as List) as String

codepointsListlist of Numeric objects representing Unicode code points

Creates a String from codepoints, which must be a List of Numerics representing Unicode code points. Returns the zero-length String if codepoints is an empty List.

14.2. contains

contains(source as String, searchText as String) as Bool

sourceStringstring to test
searchTextStringsearch text

Determines if the string source contains the text searchText as substring.

14.3. decode-from-uri

decode-from-uri(uriquoted as String) as String

uriquotedString

This function decodes URI-encoded characters in uriquoted. It un-escapes a reserved character by replacing it with its Unicode codepoint as described in RFC 3986.

The behaviour of the function when passed an invalid URI-encoded string as per RFC 3986 is undefined.

Example 5.111. 

decode-from-uri("~b%C3%A9b%C3%A8%2F" )

will return "~bébè/".


See also: encode-for-uri()

14.4. encode-for-uri

encode-for-uri(uriPart as String) as String

uriPartString

This function encodes reserved URI characters in uriPart. It escapes a reserved character by replacing it with its percent-encoded form as described in RFC 3986.

All characters are escaped except those identified as "unreserved" by RFC 3986, that is the upper- and lower-case letters A-Z, the digits 0-9, HYPHEN-MINUS ("-"), LOW LINE ("_"), FULL STOP ".", and TILDE "~".

Note

Note that this function will escape URI delimiters and therefore cannot be used indiscriminately to encode "invalid" characters in a path segment.

Example 5.112. 

encode-for-uri("~bébè/" )

will return "~b%C3%A9b%C3%A8%2F".


See also: decode-from-uri()

14.5. ends-with

ends-with(source as String, searchText as String) as Bool

sourceStringstring to test
searchTextStringsearch text

Determines if the string source ends in the text searchText.

14.6. escape-characters (deprecated) 

escape-characters(sourceString as String, escapeChars as String, escapingMode as Numeric) as String

This function is deprecated and should no longer be used. It will be removed in a subsequent release.

14.7. escape-characters

escape-characters(source as String, mode as Id) as String

sourceStringa string
modeIdBACKSLASH | XML | XMLATTR

Escapes characters in source using the specified mode.

Currently, the following escaping modes are supported:

BACKSLASH

This mode escapes the backslash '\' character using a backslash (effectively doubling it).

XML

This mode escapes the following characters that are not allowed to occur verbatimly in XML PCDATA content by replacing them with character references: < > &

XMLATTR

This mode escapes the following characters that are not allowed to occur verbatimly in XML attribute data content by replacing them with character references: < > & " ' U+000A U+000D

Example 5.113. 

escape-characters( "R&D", XML )

will return the string A&amp;D .

escape-characters( """C:\""", BACKSLASH )

will return the string C:\\ .


14.8. escape-characters

escape-characters(source as String, mode as Id, characters as String) as String

sourceStringa string
modeIdBACKSLASH | DOUBLING
charactersStringthe characters to escape

Escapes all characters found in characters that occur in source using the specified mode.

Currently, the following escaping modes are supported:

BACKSLASH

This mode escapes all characters that are in characters using a backslash '\'. This will also always include the backslash character itself, even if not explicitly specified in characters.

DOUBLING

This mode escapes all characters that are in characters by doubling them.

Example 5.114. 

escape-characters( """lose""", DOUBLING, "eo" )

will return the string loosee .

escape-characters( """C:\testing""", BACKSLASH, "si" )

will return the string C:\\te\st\ing .


14.9. format-numeric

format-numeric(num as Numeric, dimension as Id, precision as Numeric) as String

numNumericthe length value to format
dimensionIdcm | mm | in | pt
precisionNumericfraction length in digits (≥0)

This method serves to format a Numeric value into a string. This is most useful for length valued Numerics, as the function includes automatic unit conversion in this case.

num is the Numeric to be formatted.

dimension is the target unit or dimension (as an Id) to convert the result into.

precision is the integer number of decimals to which the formatted value should be rounded.

Example 5.115. 

format-numeric( 1in, cm, 2 )

will return the String "2.54cm".

format-numeric( 45.67mm, cm, 2 )

will return the String "4.57cm".


14.10. get-string-width

get-string-width(text as String, font as String, fontsize as Numeric, italic as Bool, bold as Bool) as Numeric

textStringstring to measure
fontStringfont name
fontsizeNumericfont size
italicBooltrue when calculating for italic font variant
boldBooltrue when calculating for bold font variant

Calculates the absolute width the given text will use up during rendering when using the specified font at the specified fontsize, and when using none, either or both of italic and bold properties during rendering.

Important

Since the calculation algorithm uses the font metrics as Java sees them and also requires that the specified font be installed and accessible by the JVM running the application, there's no guarantee that the width returned by this function will be identical to the width of the string as Word (or any other rendering engine, in fact) would render it. Use this only for a rough estimate as to how wide the specified text will probably run.

Example 5.116. 

get-string-width( "Hello world!", "Times New Roman", 12pt, false, false )

might return a value of 1233tw on a Mac default installation, and

get-string-width( "Hello world!", "Times New Roman", 12pt, false, true )

might return a value of 1286tw on a Mac default installation, since the bold variant of the font runs wider.


14.11. index-of

index-of(text as String, substring as String) as Numeric

textStringa string
substringStringstring to search for and return the starting index of (if found)

Returns the index within text of the first occurrence of substring. If substring does not occur, -1 is returned.

14.12. index-of

index-of(text as String, substring as String, fromIndex as Numeric) as Numeric

textStringa string
substringStringstring to search for and return the starting index of (1-based; if found)
fromIndexNumericposition to start searching at (1-based)

Returns the index within text of the first occurrence of substring, starting at the specified fromIndex. If substring does not occur, -1 is returned.

14.13. lower-case

lower-case(text as String) as String

textStringa string

Returns the passed string converted to all lower-case. The result is the same as if calling Java's java.lang.String.toLowerCase() on the source.

14.14. matches

matches(sourceString as String, regExpr as String) as Bool

sourceStringStringa string
regExprStringregular expression to match sourceString against

Determines if the sourceString matches the regular expression regExpr (passed as String).

14.15. matches-list

matches-list(sourceString as String, regExpr as String) as List

sourceStringStringa string
regExprStringregular expression to match sourceString against

This function returns all matches of regExpr in sourceString as a List of Lists.

The inner lists contain as their first element the complete matching subsequence, followed by all capturing groups as defined in the match-pattern. Each occurring match of the whole pattern in sourceString creates one entry in the outer list.

Example 5.117. 

matches-list( "aabb", "a(b)*")

will return

{
  {"a",""},
  {"abb","b"}
}

Note how a repeated group always only returns its last match (as per the regular expression semantics) – here this is the single "b" in the second match of the pattern.


14.16. normalize-space

normalize-space(sourceString as String) as String

sourceStringStringa string

Normalizes whitespace in the passed string argument and returns it.

Whitespace normalization is performed by stripping leading and trailing whitespace and then replacing a sequence of two or more whitespace characters by a single U+0020 (SPACE) character.

Whitespace for this function is defined as any of the following four characters: U+0009 (CHARACTER TABULATION), U+000A (LINE FEED (LF)), U+000D (CARRIAGE RETURN (CR)), U+0020 (SPACE).

14.17. parse-numbering

parse-numbering(numbering as String, format as List) as List

numberingStringthe numbering string
formatListthe expected numbering format

This function lets you parse a formatted, possibly multi-level number into the (list of) integers it represents.

The function throws an EvalException when the numbering string does not match the format.

It is intended to e.g. parse a textual list or heading numbering string to determine the resulting nesting level and nesting structure of the respective logical item. This can be useful when the nesting structure has not been marked up explicitly in the source document by applying appropriate styles or choosing descriptive markup und must be inferred solely from the textual numbering string present.

numbering contains the numbering string (which may be multi-level) to parse.

format is the list of Strings defining the expected format of numbering.

The individual items of format are classified either as separator tokens or numbering format tokens.

Separator tokens: '*' | '?' | '+' | literalcharseq

The tokens have the following meanings:

*

matches 0, 1 or more characters

?

matches 0 or 1 character

+

matches 1 or more characters

literalcharseq

matches exactly the specified sequence of characters. If you want to match *, ?, +, # or \ literally, you must quote them with a backslash (\).

Numbering format tokens: '#' ( 'i' | 'I' | 'a' | 'A' | '1' | 'h' | 'H' | 'b' | 'lower-greek' | 'upper-greek' )

Numbering tokens are identified by a leading hash mark (#). The tokens have the following meaning:

i, I

roman numbering, either all lowercase or uppercase, respectively

a, A

alphabetic numbering, either all lowercase or uppercase, respectively

1

decimal numbering

h, H

hex numbering, either all lowercase or uppercase, respectively

b

binary numbering (only 0 or 1 as a sequence)

lower-greek, upper-greek

greek numbering, either all lowercase or uppercase, respectively. See also: CSS3 Lists.

Additionally, each token may have options that are added each with a leading '/' character. No no whitespace is allowed in between options.

Separator token options:

/allowed=charseq

[only for wildcards *,?,+] charseq lists the characters allowed in a matching wildcard string. To include the forward slash (/) in the list of allowed characters, you need to quote it with a backward slash (\) as otherwise, it would be treated as the start marker of the next option.

Numbering format tokens:

/repeat=integer

indicates that each numbering character must be repeated the specified number of times. The default is 1. For a numbering like "aa", "bb" use the format token "#a/repeat=2".

/ignore-case

the case of the numbering characters is disregarded. Use this option when the case of numbering may vary without changing its semantics. Most useful in hex numbering format (h, H) when you are cannot be sure whether the numbers are specified in upper or lower case (or even mixed).

The result is a List of Numeric values, one for each numbering format token in format, with the (1-based) numeric value of the numbering string.

Example 5.118. Examples:

parse-numbering( "A.iv", { "#A/ignore-case", "*", "#i" } )

will return

{ 1, 4 }

 

parse-numbering( "1bb", { "#1", "*", "#a/repeat=2" } )

will return

{ 1, 2 }

 

parse-numbering( "#4", { "\\#", "#1" } )

will return

{ 4 }

 

parse-numbering( "1.2", { "#1", ".", "#1", ".", "#1" } )

will throw an EvalException because the format does not match the numbering (it containes more components than the input has).


14.18. process-adjacent-text

process-adjacent-text(where as Id, action as String, regex as String, borderCondition as BoolExpression) as Void

whereIdLEFT | RIGHT
actionStringpull-text(group?) | push-text(group?) | markup-text-inside(group) | markup-text-outside(group) | delete-text-inside() | delete-text-outside() (as string)
regexStringregular expression
borderConditionBoolExpressionboolean expression to evaluate with the actual context later

This function lets you process text content that is adjacent to the context node’s left or right sides (i.e., start or end). This can be either text within the element or outside – which one is determined by the action. The text content to be processed is specified using a regular expression.

The where parameter specifies at which side of the context node adjacent text should be processed:

LEFT

process text adjacent to the left side (=start) of the element

RIGHT

process text adjacent to the right side (=end) of the element.

The action parameter string specifies what to do with any adjacent text that matches the specified regular expression:

pull-text()

Text adjacent to the outside of the context node is pulled into the node and inserted as first (where=LEFT) or last (where=RIGHT) child.

pull-text(group)

Text adjacent to the outside of the context node is pulled into the node and inserted as first (where=LEFT) or last (where=RIGHT) child. Additionally, that text is surrounded by an uci:inline (or uci:block element, respectively) with an uci:type attribute with value group.

push-text()

Text adjacent to the inside of the context node is pushed out of the node and inserted as a left sibling (where=LEFT) or right sibling (where=RIGHT) Text node of the context node.

push-text(group)

Text adjacent to the inside of the context node is pushed out of the node and inserted as left sibling (where=LEFT) or right sibling (where=RIGHT) Text node of the context node. Additionally, that text is surrounded by an uci:inline (or uci:block element, respectively) with an uci:type attribute with value group.

markup-text-inside(group)

Text adjacent to the inside of the context node is extracted from the tree, surrounded by an uci:inline (or uci:block element, respectively) with an uci:type attribute with value group and re-inserted as first (where=LEFT) or last (where=RIGHT) child of the context node.

markup-text-outside(group)

Text adjacent to the outside of the context node is extracted from the tree, surrounded by an uci:inline (or uci:block element, respectively) with an uci:type attribute with value group and re-inserted as left sibling (where=LEFT) or right sibling (where=RIGHT) of the context node.

delete-text-inside()

Text adjacent to the inside of the context node at the respective side is deleted from the tree. Existing element structures are not changed, with the exception that any Text nodes becoming empty after deleting matching content will be removed from the tree as well.

delete-text-outside()

Text adjacent to the outside of the context node at the respective side is deleted from the tree. Existing element structures are not changed, with the exception that any Text nodes becoming empty after deleting matching content will be removed from the tree as well.

The text to match is specified by the regex parameter. The range of supported expressions is identical to the one supported by the Java classes in the java.util.regex package. You should not include start of text (^) or end of text ($) match codes in your regular expression. The checking for adjacency is automatically taken care of by this function.

The parameter borderCondition lets you specify a node in the ancestor axis from the context node which serves as bounding element for regex matches. That node is identified as the first node in the ancestor axis (starting from the context node) for which the specified condition evaluates to true. Any text outside the subtree of that node is not considered for an adjacent outside match.

Example 5.119. 

The typical use case for this highly specific function is to account for markup inaccuracies in the source. Let’s assume that the author was supposed to mark up year numbers in green, since you want to create an index on them later. However, the document you get looks like this:

<uci:par>In <uci:inline css:color="green">1999</uci:inline>, upCast development started and continues to date <uci:inline css:color="green">(2008)</uci:inline>.</uci:par>

You’ll notice that for the second year number, the author included the parentheses within the green markup, which is undesirable. To fix such markup mistakes, you could use the following UPL rule:

[element(uci:inline) and @css:color="green"]
  process-adjacent-text( LEFT, "push-text()", "\\(", element(uci:par) );
  process-adjacent-text( RIGHT, "push-text()", "\\)", element(uci:par) );
}

This will yield the following result:

<uci:par>In <uci:inline css:color="green">1999</uci:inline>, upCast development started and continues to date (<uci:inline css:color="green">2008</uci:inline>).</uci:par>

As you can see, the undesirable parentheses within the green year markup have been pushed out of the uci:inline element. Furthermore, that rule has no effect on markup that is already correct (as is the case with the year number 1999).


Here's a more complex example:

Example 5.120. 

Given the following XML:

<par>A number <bold>123</bold>4<italic>5 example</italic>.</par>

a rule definition like

[element(bold)] {
  process-adjacent-text( RIGHT, "group-text-outside(num)", "[0-9]*", element(par) );
}

will result in the following:

<par>A number <bold>123</bold><uci:inline uci:type="num">45</uci:inline><italic> example</italic>.</par>

Note how the action group-text-outside() only works on the text and does not change or take into account any element structures at a higher level. What happens is that in a first step, the adjacent digits are removed from the document, and are then re-inserted with the grouping uci:inline element wrapped around them. The italic info for the digit 5 is not retained.


Another example:

Example 5.121. 

Given the following XML:

<par>The <term><firstchar>A</firstchar>bc</term>.</par>

a rule definition like

[element(term)] {
  process-adjacent-text( LEFT, "push-text(caps)", "[A-Z]+", element(par) );
}

will result in the following:

<par>The <uci:inline uci:type="caps">A</uci:inline><term><firstchar></firstchar>bc</term>.</par>

Note again how the action push-text() only works on the text and does not change or take into account any element structures at a higher level. What happens is that in a first step, the capital letter A is removed from the document and then re-inserted with the grouping uci:inline element wrapped around it. The firstchar info is not retained and remains a (now) empty element within the term element.


14.19. replace

replace(source as String, pattern as String, replacement as String) as String

sourceStringa string
patternStringregular expression
replacementStringreplacement specification

The function returns the String that is obtained by replacing each non-overlapping substring of source that matches the given pattern with an occurrence of the replacement string.

If two overlapping substrings of source both match pattern, then only the first one (that is, the one whose first character comes first in the source string) is replaced.

The regular expression syntax used and understood by this function is the same as for the implementation of the java.lang.String.replaceAll() function of the underlying Java VM.

For Java 1.4.2, the regular expression syntax is documented here.

Example 5.122. 

replace( "a green glass", "gr(e+)n", "b$1r" )

returns the String "a beer glass".


14.20. replace

replace(source as String, patternlist as List, replacementlist as List) as String

sourceStringa string
patternlistListordered list of regular expressions
replacementlistListordered list of replacement specifications, matching patternlist

This function is similar to replace() except that it performs a sequence of replacement operation on source. The two Lists patternlist and replacementlist must contain an equal number of corresponding pattern/replacement Strings which are applied to source in order. In that, the following equivalence holds:

$result := replace( $s, { "ä", "ö", "ü" }, { "ae", "oe", "ue" } );

is equivalent to

$result := replace( replace( replace( $s, "ä", "ae" ), "ö", "oe" ), "ü", "ue" );

which is equivalent to

$result := replace( $s, "ä", "ae" );
$result := replace( $result, "ö", "oe" );
$result := replace( $result, "ü", "ue" );

Example 5.123. 

replace( "ä ö ö ü ü", { "ä", "ö", "ü" }, { "ae", "oe", "ue" } ) )

returns the String "ae oe oe ue ue".


14.21. starts-with

starts-with(source as String, searchText as String) as Bool

sourceStringstring to test
searchTextStringsearch text

Determines if the string source starts with the text searchText.

14.22. string-join

string-join(values as List, separator as String) as String

valuesListlist of elements (converted to String) to concatenate
separatorStringseparator string to insert between adjacent elements

Returns a String created by concatenating the string values of the members of list (applying to-string(), where necessary) using separator as the separator string. If the value of separator is the zero-length string, then the members of list are concatenated without a separator.

If the value of list is an empty List, the zero-length string is returned.

Example 5.124. 

string-join( { "a", "b", "c" }, ", " )

returns "a, b, c"

string-join( { "a", "b", "c" }, "" )

returns "abc"

string-join( { "a", { b1, b2, b3}, c, 5.123 }, "/" )

returns "a/b1 b2 b3/c/5.123". Note how the to-string() function implicitly called on the second list member (which itself is a list) creates a string separated by one whitespace character.


14.23. string-length

string-length(theString as String) as Numeric

theStringStringa string

Returns the length in characters of the passed string argument.

14.24. string-to-codepoints

string-to-codepoints(s as String) as List

sStringa string

Returns the List of Unicode code points (as Numerics) that constitute the String s. If s is a zero-length string, an empty List is returned.

14.25. substring

substring(sourceString as String, startingLoc as Numeric) as String

sourceStringStringa string
startingLocNumericstart index (1-based)

Returns the portion of the value of sourceString beginning at the position indicated by the value of startingLoc. The characters returned do not extend beyond sourceString. If startingLoc is zero or negative, only those characters in positions greater than zero are returned.

14.26. substring

substring(sourceString as String, startingLoc as Numeric, length as Numeric) as String

sourceStringStringa string
startingLocNumericstart index (1-based)
lengthNumericnumber of characters

Returns the portion of the value of sourceString beginning at the position indicated by the value of startingLoc and continuing for the number of characters indicated by the value of length.

The characters returned do not extend beyond sourceString. If startingLoc is zero or negative, only those characters in positions greater than zero are returned.

14.27. substring-after

substring-after(text as String, afterThis as String) as String

textStringa string
afterThisStringstring to search for

Returns the substring from text that follows the first occurrence of afterThis. If text does not contain afterThis, it returns the empty string.

14.28. substring-before

substring-before(text as String, beforeThis as String) as String

textStringa string
beforeThisStringstring to search for

Returns the substring from text that precedes the first occurrence of beforeThis. If text does not contain beforeThis, it returns the empty string.

14.29. substring-tail

substring-tail(sourceString as String, length as Numeric) as String

sourceStringStringa string
lengthNumericnumber of characters

Returns the last length characters of sourceString.

When length is greater than the size of sourceString, the empty string is returned.

Example 5.125. 

substring-tail( "Hello", 2 )

returns "lo".

substring-tail( "abc", 5 )

returns the empty string.


14.30. substring-tail

substring-tail(sourceString as String, startingLoc as Numeric, length as Numeric) as String

sourceStringStringa string
startingLocNumericstarting index (1-based)
lengthNumericnumber of characters

Returns length characters (in direction to the beginning) of sourceString, starting at the startingLocth last character of sourceString. When startingLoc is smaller than 1, it is set to 1. When startingLoc is greater than the size of sourceString, the empty string is returned. When length is less or equal than zero, the empty string is returned. When length is greater than there are characters to the including and towards the beginnging of startingLoc, only the actually available characters are returned.

Example 5.126. 

substring-tail( "Hello", 2, 2 )

returns "ll".

substring-tail( "abc", 2, 5 )

returns "ab".

substring-tail( "abc", 4, 2 )

returns "".


14.31. tokenize

tokenize(src as String, separatorPattern as String) as List

srcStringsource string to be split-up
separatorPatternStringregular expression matching separator character(s)

Splits the string src around matches of the given regular expression separatorPattern.

The List returned by this function contains each substring of src that is terminated by another substring that matches the given expression or is terminated by the end of the string. The substrings in the resulting list are in the order in which they occur in src. If the expression does not match any part of the input then the resulting array has just one element, namely src.

Example 5.127. 

tokenize( ";a;b;;c;", ";")

will return

{"","a","b","","c"}

14.32. trim

trim(s as String) as String

sString

Returns a copy of the string s, with leading and trailing whitespace omitted.

Whitespace for this function is defined as any of the following four characters: U+0009 (CHARACTER TABULATION), U+000A (LINE FEED (LF)), U+000D (CARRIAGE RETURN (CR)), U+0020 (SPACE).

In contrast to normalize-space(), whitespace characters within the string are not changed or minimized.

14.33. unescape-characters

unescape-characters(source as String, mode as Id) as String

sourceStringa string
modeIdBACKSLASH | DOUBLING

Unescapes any escaped characters in source using the specified mode.

Currently, the following unescaping modes are supported:

BACKSLASH

This mode unescapes any characters escaped (=preceded) by the backslash '\' character.

DOUBLING

This mode unescapes any pair of subsequent, identical characters.

Example 5.128. 

unescape-characters( "redeem 1000$", DOUBLING )

will return the string redem 100$ .

unescape-characters( "redeem 10000$", DOUBLING )

will also return the string redem 100$ .

unescape-characters( """C:\\te\st""", BACKSLASH )

will return the string C:\test .


14.34. unescape-characters

unescape-characters(source as String, mode as Id, characters as String) as String

sourceStringa string
modeIdBACKSLASH | DOUBLING
charactersStringthe character(s) to unescape

Unescapes any escaped characters in characters found in source using the specified mode.

Currently, the following unescaping modes are supported:

BACKSLASH

This mode unescapes any characters in characters escaped (=preceded) by the backslash '\' character.
Note that you must explicitly include the '\' character in characters if you want it to be unescaped as well when preceded by a backslash!

DOUBLING

This mode unescapes any pair of subsequent, identical characters found in characters.

Example 5.129. 

unescape-characters( "Massachusetts", DOUBLING, "st" )

will return the string Masachusets .

unescape-characters( """C:\\te\st""", BACKSLASH, "es" )

will return the string C:\\test .

unescape-characters( """C:\\te\st""", BACKSLASH, """\es""" )

will return the string C:\test .


14.35. upper-case

upper-case(text as String) as String

textStringa string

Returns the passed string converted to all upper-case. The result is the same as if calling Java's java.lang.String.toUpperCase() on the source.