StringUtf8 Type

Use the StringUtf8 primitive type to define StringUtf8 variables and attributes; that is, strings that have been encoded in the UTF8 format. This allows all valid Unicode characters to be used even in an ANSI system. A character string contains zero or more characters. A null string is a string that has a zero length (""). You can access characters in a string as components of an array.

When you specify a length less than or equal to 540 for a StringUtf8 attribute, it is embedded. Space is allocated within instances of the class to store a string with a length less than or equal to the specified length.

When you specify a length greater than 540 or you select the Maximum Length check box (which corresponds to 2,147,483,647 characters) for a StringUtf8 attribute, it is not embedded. It is stored in a separate variable-length object, a StringUtf8 Large Object (slobutf8), which can store a string with a length less than or equal to the specified length. The amount of storage required for a slob is determined by the value of the string.

StringUtf8 variables can be bounded or unbounded, as shown in the following code fragment.

vars
   s1 : StringUtf8[100]; // Bounded - s1 can store a string with a
                         // length less than or equal to 100 characters
   s2 : StringUtf8;      // Unbounded - s2 can store a string with a length
                         // less than or equal to 2,147,483,647 characters

The ordering relationship of the character values in corresponding positions sets the ordering between two string values. In strings of unequal length, each character in the longer string without a corresponding character in the shorter string takes on a greater-than value; for example, Zs is greater than Z. Null strings can be equal only to other null strings.

To specify a substring str[m:n] of a string str, two integers separated by a colon (:) character are used. The first integer indicates the start position and the second integer is the length of the substring. In place of the second integer, end indicates the substring extends to the end of the string. For a substring starting at the first character of the string, the first integer would be 1.

If the length of a substring is zero (0), a null string ("") is returned.

You can ignore the fact that a non-ASCII character in a UTF8 string requires more than one byte of storage, as the start position and length integers are based on character positions rather than on byte positions.

A StringUtf8 literal is enclosed in double ("") or single ('') quotation marks, and is usually preceded by an at sign (@), as shown in the following example.

stringUtf8 := @"Jade Software";

If all the characters are US-ASCII characters, as in the preceding example, the @ sign is optional.

The StringUtf8 literal can contain a non-US-ASCII character, by enclosing a value representing the character between an ampersand (&) character and a semicolon (;) character, as shown in the following examples.

stringUtf8 := @"Copyright &copy; Jade Software";

stringUtf8 := @"Copyright &#169; Jade Software";

stringUtf8 := @"Copyright &#xA9; Jade Software";

In the first example, a character entity reference as defined in the HTML 4 standard is used. In the second and third examples, the value of the Unicode code point of the character in decimal and in hexadecimal is used.

A variable of type StringUtf8 can be used to reference a single character in a string, in effect treating the string as an array of one-character UTF8 strings, as shown in the following code fragment.

vars
   str1 : StringUtf8;
   str2 : StringUtf8;
begin
   str1 := @"JADE Primitive Types";
   str2 := str1[7];       // UTF8 string consisting of seventh character 'r'

For details about the methods defined in the StringUtf8 primitive type, see "StringUtf8 Methods", in the following subsection. For details about converting primitive types, see "Converting Primitive Types", in Chapter 1 of the JADE Developer’s Reference.