Description
The string class stores a text value in UTF-8.
Operators
- char&
operator[]( int index ) – Accesses value at location
index. Throws an exception if
index is out of range.
- string&
operator =( string s ) – Copies the values from string
s. Returns this object.
- string&
operator =( char32_t c ) – Sets the value to the code point
c. Returns this object.
- string&
operator +=( string s ) – Appends the value from string
s. Returns this object.
- string&
operator +=( char c ) – Appends the character
c. Returns this object.
- bool
operator ==( string s ) – Returns true if both strings contain
identical text.
- bool
operator !=( string s ) – Returns true if the strings
represent different text.
- bool
operator <=( string s ) – Compares two strings
lexicographically.
- bool
operator >=( string s ) – Compares two strings
lexicographically.
- bool
operator <( string s ) – Compares two strings
lexicographically.
- bool
operator >( string s ) – Compares two strings
lexicographically.
Methods
- size_t
find(string s) – Finds the first instance of
s, returning the 0-based index. When not found,
returns
string_npos.
- size_t
find(string s, size_t index) – Finds the first instance of
s starting at offset
index, returning the 0-based index. When not found,
returns
string_npos.
- size_t
rfind(string s) – Finds the last instance of
s, returning the 0-based index. When not found,
returns
string_npos.
- size_t
rfind(string s, size_t index) – Finds the last instance of
s starting at offset
index, returning the 0-based index. When not found,
returns
string_npos.
- size_t
find_first_of(string s) – Finds the first matching character
from the characters in
s, returning the 0-based index. When not found,
returns
string_npos.
- size_t
find_first_of(string s, size_t index) – Finds the first
matching character from the characters in
s starting at offset
index, returning the 0-based index. When not found,
returns
string_npos.
- size_t
find_first_not_of(string s) – Finds the first non-matching
character from the characters in
s, returning the 0-based index. When not found,
returns
string_npos.
- size_t
find_first_not_of(string s, size_t index) – Finds the first
non-matching character from the characters in
s starting at offset
index, returning the 0-based index. When not found,
returns
string_npos.
- size_t
find_last_of(string s) – Finds the last matching character
from the characters in
s, returning the 0-based index. When not found,
returns
string_npos.
- size_t
find_last_of(string s, size_t index) – Finds the last matching
character from the characters in
s starting at offset
index, returning the 0-based index. When not found,
returns
string_npos.
- size_t
find_last_not_of(string s) – Finds the last non-matching
character from the characters in
s, returning the 0-based index. When not found,
returns
string_npos.
- size_t
find_last_not_of(string s, size_t index) – Finds the last
non-matching character from the characters in
s starting at offset
index, returning the 0-based index. When not found,
returns
string_npos.
- void
insert_at(int index, char c)1 – Inserts character
c at position
index. Throws an exception if
index is out of range.
- void
insert32(int index, char32_t c)1 – Inserts the code
point
c before the code point at position
index. Throws an exception if
index is out of range.
- void
erase_at(int index)1 – Removes character at
position
index. Throws an exception if
index is out of range.
- void
push_back(char c) – Appends character
c.
- void
append32(char32_t c) – Appends the code point
c.
- void
clear() – Empties the string.
- bool
empty() – Returns true if the string is empty.
- size_t
size() – Returns the length of the string in bytes.
- size_t
length() – Returns the length of the string in bytes.
- int
length32() – Returns the count of code points in the string.
- string
substr(size_t index, size_t len)1 – Returns a
substring starting at offset
index with length of
len. The value
string_npos can be used for the
len parameter to return the remaining string
contents. Throws an exception if
index plus
len is out of range.
- string
substr32(int index, int count) – Returns a substring starting
with the code point at
index and containing up to
count code points. The value
string_npos can be used for the
count parameter to return the remaining string
contents. Throws an exception if
index is out of range.
- string
mid( size_t index )1 – Returns a substring starting
at offset
index. Does not throw an exception.
- string
mid( size_t index, size_t len )1 – Returns a
substring starting at offset
index with length up to
len. Does not throw an exception.
- string
mid32( int index ) – Returns a substring starting with the
code point at
index. Does not throw an exception.
- string
mid32( int index, int count ) – Returns a substring starting
with the code point at
index and containing up to
count code points. Does not throw an exception.
- string
left( size_t len )1 – Returns a substring with up
to the first
len characters. Does not throw an exception.
- string
left32( int count ) – Returns a substring containing up to
count code points. Does not throw an exception.
- string
right( size_t len )1 – Returns a substring with up
to the last
len characters. Does not throw an exception.
- string
right32( int count ) – Returns a substring containing up to
the last
count code points. Does not throw an exception.
- char32_t
at32( int index ) – Returns the code point at position
index. Throws an exception if
index is out of range.
- string&
replace( size_t start, size_t count, string newStr
)1 – Replaces the part of the string indicated by [start, start
+ count) with the text in
newStr (newStr can be empty).
- string&
replace32( int start, int count, string newStr ) – Replaces
the part of the string indicated by [start, start + count) (as code points)
with the text in
newStr (newStr can be empty).
- string&
replaceAll( string oldStr, string newStr ) – Replaces each
instance of
oldStr with the text in
newStr (newStr can be empty).
- string&
replaceFirst( string oldStr, string newStr ) – Replaces the
first instance of
oldStr with the text in
newStr (newStr can be empty).
- string&
replaceFirst( string oldStr, string newStr, startPosition ) –
Replaces the first instance of
oldStr with the text in
newStr (newStr can be empty),
starting at index
startPosition.
- string&
replaceLast( string oldStr, string newStr ) – Replaces the
last instance of
oldStr with the text in
newStr (newStr can be empty).
- string&
truncate( size_t newLength ) – Shortens the string text value
to the new length. The new string may be shorter than
newLength if truncation would break a Unicode
code–point sequence.
- string&
truncate32( int count ) – Shortens the string text up to
count code points.
- string
ltrim() – Removes whitespace from the beginning of the string,
returning a new string.
- string&
ltrim_self() – Removes whitespace characters from the
beginning of this string.
- string&
ltrim_self( string s ) – Removes any characters found in
s from the beginning of this string.
- string
rtrim() – Removes whitespace from the end of the string,
returning a new string.
- string&
rtrim_self() – Removes whitespace characters from the end of
this string.
- string&
rtrim_self( string s ) – Removes any characters found in
s from the end of this string.
- string
trim() – Removes whitespace from the beginning and end of the
string, returning a new string.
- string&
trim_self() – Removes whitespace characters from the beginning
and end of this string.
- string&
trim_self( string s ) – Removes any characters found in
s from the beginning and end of this string.
- string&
reverse_self() – Reverses the ordering of this string, keeping
Unicode code–points together.
- string&
toLowerASCII() – All ASCII characters will be converted to
ASCII lower case equivalent values.
- string&
toLower( Locale l ) – All characters will be converted to
lower case equivalent values using the rules from the Locale provided.
- string&
toUpperASCII() – All ASCII characters will be converted to
ASCII upper case equivalent values.
- string&
toUpper( Locale l ) – All characters will be converted to
upper case equivalent values using the rules from the Locale provided.
- void
reserve( size_t v ) – Pre-allocates a buffer of size
v for future use.
- bool
u_normalizeNFC() – Applies the NFC (composed) Unicode
normalization. Returns true if the string was normalized.
- bool
u_normalizeNFD() – Applies the NFD (decomposed) Unicode
normalization. Returns true if the string was normalized.
- u32string
to_u32string() – Returns the contents of the string encoded as
UTF-32. See
u32string.
1These functions can break a UTF-8 multibyte sequence.
Consider using the equivalent function with "32" in the suffix.
String Iteration
The string class represents text in UTF-8. If you iterate a string
using the
for( c : s ) syntax, the loop will iterate the string
byte by byte, potentially breaking code points. This can be resolved one of two
ways:
- Use
for( c : s.to_u32string() ). This will create a
temporary u32string allowing you to iterate code point by code point.
- Use the UTF-8 iteration functions below:
- char32_t
utf8_next( int& index ) – Used for iterating a string
by code point, this will return the code point at
index, then move
index forward to the next code point. Throws
exception if
index is out of bounds. For example, the
following will output each of the four code points in order:
var s = "über";
if ( !s.empty() ) {
for ( var index = 0; index < s.length(); ) {
var c = s.utf8_next( index );
Ext.WriteStream( c.to_string() );
}
}
- char32_t
utf8_prev( int& index ) – Used for iterating a string
by code point, this will move
index backward to the previous code point, then
return the code point at that
index. Throws exception if
index is out of bounds. For example, the
following will output each of the four code points in reverse order:
var s = "über";
if ( !s.empty() ) {
for ( var index = int(s.length()); index > 0; ) {
var c = s.utf8_prev( index );
Ext.WriteStream( c.to_string() );
}
}
- void
utf8_skipForward( int& index, int count ) – Used for
iterating a string by code point, this will move
index forward
count code points. Throws exception if
index is out of bounds. For example, the
following will output the fourth code point of the string:
var s = "über";
var index = 0;
s.utf8_skipForward( index, 3 );
Ext.WriteStream( s.at32( index ).to_string() );
- void
utf8_skipBackward( int& index, int count ) – Used for
iterating a string by code point, this will move
index backward
count code points. Throws exception if
index is out of bounds. For example, the
following will output the second code point in the string:
var s = "über";
var index = int( s.length() );// index must be an int
s.utf8_skipBackward( index, 3 );
Ext.WriteStream( s.at32( index ).to_string() );
Unicode Utility Methods
- bool
utf8_isSingle( int index ) – Returns true if the byte at
index is a single-byte UTF-8 sequence. Throws
exception if
index is out of bounds.
- bool
utf8_isLead( int index ) – Returns true if the byte at
index is the first byte in a multi-byte UTF-8
sequence. Throws exception if
index is out of bounds.
- bool
utf8_isTrail( int index ) – Returns true if the byte at
index is a trailing byte in a multi-byte UTF-8
sequence. Throws exception if
index is out of bounds.
- int
utf8_cpLength( int index ) – Returns the length of the UTF-8
sequence at
index. Throws exception if
index is out of bounds.
- bool
u_isUpperCase( int index ) – Returns true if the code point at
position
index is an upper case letter character (equivalent
to java.lang.Character.isUpperCase()). Throws exception if index is out of
bounds.
- bool
u_isLowerCase( int index ) – Returns true if the code point at
position
index is a lower case letter character (equivalent
to java.lang.Character.isLowerCase()). Throws exception if index is out of
bounds.
- bool
u_isSpace( int index ) – Returns true if the code point at
position
index is a white space character; similar to C/POSIX
isspace(). Throws exception if
index is out of bounds.
- bool
u_isSpaceChar( int index ) – Returns true if the code point at
position
index is a space character (equivalent to
java.lang.Character.isSpaceChar()). Throws exception if
index is out of bounds.
- bool
u_isWhitespace( int index ) – Returns true if the code point
at position
index is a white space character (similar to
java.lang.Character.isWhitespace()). Throws exception if
index is out of bounds.
- bool
u_isLetter( int index ) – Returns true if the code point at
position
index is a letter character (equivalent to
java.lang.Character.isLetter()). Throws exception if
index is out of bounds.
- bool
u_isDigit( int index ) – Returns true if the code point at
position
index is a digit character (equivalent to
java.lang.Character.isDigit()). Throws exception if
index is out of bounds.
- bool
u_isLetterOrDigit( int index ) – Returns true if the code
point at position
index is an alphanumeric character (letter or digit)
(equivalent to java.lang.Character.isLetterOrDigit()). Throws exception if
index is out of bounds.
- bool
u_isPunctuation( int index ) – Returns true if the code point
at
index is a punctuation character. Throws exception
if
index is out of bounds.
- bool
u_isBMP( int index ) – Returns true if the code point at
index is in the Unicode Basic Multilingual Plane.
Throws exception if
index is out of bounds.
- bool
u_isGraphic( int index ) – Returns true if the code point at
index is a "graphic" character (printable, excluding
spaces). Throws exception if
index is out of bounds.
- bool
u_isPrintable( int index ) – Returns true if the code point at
index is a printable character. Throws exception if
index is out of bounds.
- bool
u_isBlank( int index ) – Returns true if the code point at
index is a "blank" or "horizontal space", a
character that visibly separates words on a line. Throws exception if
index is out of bounds.
- bool
u_isDefined( int index ) – Returns true if the code point at
index is "defined", which usually means it is
assigned a character in Unicode (equivalent to
java.lang.Character.isDefined()). Throws exception if
index is out of bounds.
- int
u_charType( int index ) – Returns the Unicode general category
of the code point at position
index (equivalent to java.lang.Character.getType()).
Return value can be tested against UnicodeCharTypeConstants. Throws exception
if
index is out of bounds. See
UnicodeCharTypeConstants.
- int
u_digit( int index, int radix ) – Returns the decimal digit
value of the code point at position
index in the specified radix. Throws exception if
index is out of bounds.
- void
u_toLower( int index ) – Changes code point at
index to lower case using non-Locale-based Unicode
mapping. Throws exception if
index is out of bounds.
- void
u_toUpper( int index ) – Changes code point at
index to upper case using non-Locale-based Unicode
mapping. Throws exception if
index is out of bounds.
Copyright © 2007–2019 Micro Focus or one of its affiliates. All rights reserved.