string

Description

The string class stores a text value in UTF-8.

Constructors

string() – Creates an empty string object.
string(string s) – Creates a string with a copy of s.

Operators

char& operator[]( int index ) – Accesses value at location index. Throws an exception if index is out of range.
string& operator =( string s ) – Copies the values from string s. Returns this object.
string& operator =( char32_t c ) – Sets the value to the code point c. Returns this object.
string& operator +=( string s ) – Appends the value from string s. Returns this object.
string& operator +=( char c ) – Appends the character c. Returns this object.
bool operator ==( string s ) – Returns true if both strings contain identical text.
bool operator !=( string s ) – Returns true if the strings represent different text.
bool operator <=( string s ) – Compares two strings lexicographically.
bool operator >=( string s ) – Compares two strings lexicographically.
bool operator <( string s ) – Compares two strings lexicographically.
bool operator >( string s ) – Compares two strings lexicographically.

Methods

size_t find(string s) – Finds the first instance of s, returning the 0-based index. When not found, returns string_npos.
size_t find(string s, size_t index) – Finds the first instance of s starting at offset index, returning the 0-based index. When not found, returns string_npos.
size_t rfind(string s) – Finds the last instance of s, returning the 0-based index. When not found, returns string_npos.
size_t rfind(string s, size_t index) – Finds the last instance of s starting at offset index, returning the 0-based index. When not found, returns string_npos.
size_t find_first_of(string s) – Finds the first matching character from the characters in s, returning the 0-based index. When not found, returns string_npos.
size_t find_first_of(string s, size_t index) – Finds the first matching character from the characters in s starting at offset index, returning the 0-based index. When not found, returns string_npos.
size_t find_first_not_of(string s) – Finds the first non-matching character from the characters in s, returning the 0-based index. When not found, returns string_npos.
size_t find_first_not_of(string s, size_t index) – Finds the first non-matching character from the characters in s starting at offset index, returning the 0-based index. When not found, returns string_npos.
size_t find_last_of(string s) – Finds the last matching character from the characters in s, returning the 0-based index. When not found, returns string_npos.
size_t find_last_of(string s, size_t index) – Finds the last matching character from the characters in s starting at offset index, returning the 0-based index. When not found, returns string_npos.
size_t find_last_not_of(string s) – Finds the last non-matching character from the characters in s, returning the 0-based index. When not found, returns string_npos.
size_t find_last_not_of(string s, size_t index) – Finds the last non-matching character from the characters in s starting at offset index, returning the 0-based index. When not found, returns string_npos.
void insert_at(int index, char c)¹ – Inserts character c at position index. Throws an exception if index is out of range.
void insert32(int index, char32_t c)¹ – Inserts the code point c before the code point at position index. Throws an exception if index is out of range.
void erase_at(int index)¹ – Removes character at position index. Throws an exception if index is out of range.
void push_back(char c) – Appends character c.
void append32(char32_t c) – Appends the code point c.
void clear() – Empties the string.
bool empty() – Returns true if the string is empty.
size_t size() – Returns the length of the string in bytes.
size_t length() – Returns the length of the string in bytes.
int length32() – Returns the count of code points in the string.
string substr(size_t index, size_t len)¹ – Returns a substring starting at offset index with length of len. The value string_npos can be used for the len parameter to return the remaining string contents. Throws an exception if index plus len is out of range.
string substr32(int index, int count) – Returns a substring starting with the code point at index and containing up to count code points. The value string_npos can be used for the count parameter to return the remaining string contents. Throws an exception if index is out of range.
string mid( size_t index )¹ – Returns a substring starting at offset index. Does not throw an exception.
string mid( size_t index, size_t len )¹ – Returns a substring starting at offset index with length up to len. Does not throw an exception.
string mid32( int index ) – Returns a substring starting with the code point at index. Does not throw an exception.
string mid32( int index, int count ) – Returns a substring starting with the code point at index and containing up to count code points. Does not throw an exception.
string left( size_t len )¹ – Returns a substring with up to the first len characters. Does not throw an exception.
string left32( int count ) – Returns a substring containing up to count code points. Does not throw an exception.
string right( size_t len )¹ – Returns a substring with up to the last len characters. Does not throw an exception.
string right32( int count ) – Returns a substring containing up to the last count code points. Does not throw an exception.
char32_t at32( int index ) – Returns the code point at position index. Throws an exception if index is out of range.
string& replace( size_t start, size_t count, string newStr )¹ – Replaces the part of the string indicated by [start, start + count) with the text in newStr (newStr can be empty).
string& replace32( int start, int count, string newStr ) – Replaces the part of the string indicated by [start, start + count) (as code points) with the text in newStr (newStr can be empty).
string& replaceAll( string oldStr, string newStr ) – Replaces each instance of oldStr with the text in newStr (newStr can be empty).
string& replaceFirst( string oldStr, string newStr ) – Replaces the first instance of oldStr with the text in newStr (newStr can be empty).
string& replaceFirst( string oldStr, string newStr, startPosition ) – Replaces the first instance of oldStr with the text in newStr (newStr can be empty), starting at index startPosition.
string& replaceLast( string oldStr, string newStr ) – Replaces the last instance of oldStr with the text in newStr (newStr can be empty).
string& truncate( size_t newLength ) – Shortens the string text value to the new length. The new string may be shorter than newLength if truncation would break a Unicode code–point sequence.
string& truncate32( int count ) – Shortens the string text up to count code points.
string ltrim() – Removes whitespace from the beginning of the string, returning a new string.
string& ltrim_self() – Removes whitespace characters from the beginning of this string.
string& ltrim_self( string s ) – Removes any characters found in s from the beginning of this string.
string rtrim() – Removes whitespace from the end of the string, returning a new string.
string& rtrim_self() – Removes whitespace characters from the end of this string.
string& rtrim_self( string s ) – Removes any characters found in s from the end of this string.
string trim() – Removes whitespace from the beginning and end of the string, returning a new string.
string& trim_self() – Removes whitespace characters from the beginning and end of this string.
string& trim_self( string s ) – Removes any characters found in s from the beginning and end of this string.
string& reverse_self() – Reverses the ordering of this string, keeping Unicode code–points together.
string& toLowerASCII() – All ASCII characters will be converted to ASCII lower case equivalent values.
string& toLower( Locale l ) – All characters will be converted to lower case equivalent values using the rules from the Locale provided.
string& toUpperASCII() – All ASCII characters will be converted to ASCII upper case equivalent values.
string& toUpper( Locale l ) – All characters will be converted to upper case equivalent values using the rules from the Locale provided.
void reserve( size_t v ) – Pre-allocates a buffer of size v for future use.
bool u_normalizeNFC() – Applies the NFC (composed) Unicode normalization. Returns true if the string was normalized.
bool u_normalizeNFD() – Applies the NFD (decomposed) Unicode normalization. Returns true if the string was normalized.
u32string to_u32string() – Returns the contents of the string encoded as UTF-32. See u32string.

¹These functions can break a UTF-8 multibyte sequence. Consider using the equivalent function with "32" in the suffix.

String Iteration

The string class represents text in UTF-8. If you iterate a string using the for( c : s ) syntax, the loop will iterate the string byte by byte, potentially breaking code points. This can be resolved one of two ways:

Use for( c : s.to_u32string() ). This will create a temporary u32string allowing you to iterate code point by code point.
Use the UTF-8 iteration functions below:
- char32_t utf8_next( int& index ) – Used for iterating a string by code point, this will return the code point at index, then move index forward to the next code point. Throws exception if index is out of bounds. For example, the following will output each of the four code points in order:
```
var s = "über";
if ( !s.empty() ) {
  for ( var index = 0; index < s.length(); ) {
    var c = s.utf8_next( index );
    Ext.WriteStream( c.to_string() );
  }
}
```
- char32_t utf8_prev( int& index ) – Used for iterating a string by code point, this will move index backward to the previous code point, then return the code point at that index. Throws exception if index is out of bounds. For example, the following will output each of the four code points in reverse order:
```
var s = "über";
if ( !s.empty() ) {
  for ( var index = int(s.length()); index > 0; ) {
    var c = s.utf8_prev( index );
    Ext.WriteStream( c.to_string() );
  }
}
```
- void utf8_skipForward( int& index, int count ) – Used for iterating a string by code point, this will move index forward count code points. Throws exception if index is out of bounds. For example, the following will output the fourth code point of the string:
```
var s = "über";
var index = 0;
s.utf8_skipForward( index, 3 );
Ext.WriteStream( s.at32( index ).to_string() );
```
- void utf8_skipBackward( int& index, int count ) – Used for iterating a string by code point, this will move index backward count code points. Throws exception if index is out of bounds. For example, the following will output the second code point in the string:
```
var s = "über";
var index = int( s.length() );// index must be an int
s.utf8_skipBackward( index, 3 );
Ext.WriteStream( s.at32( index ).to_string() );
```

Unicode Utility Methods

bool utf8_isSingle( int index ) – Returns true if the byte at index is a single-byte UTF-8 sequence. Throws exception if index is out of bounds.
bool utf8_isLead( int index ) – Returns true if the byte at index is the first byte in a multi-byte UTF-8 sequence. Throws exception if index is out of bounds.
bool utf8_isTrail( int index ) – Returns true if the byte at index is a trailing byte in a multi-byte UTF-8 sequence. Throws exception if index is out of bounds.
int utf8_cpLength( int index ) – Returns the length of the UTF-8 sequence at index. Throws exception if index is out of bounds.
bool u_isUpperCase( int index ) – Returns true if the code point at position index is an upper case letter character (equivalent to java.lang.Character.isUpperCase()). Throws exception if index is out of bounds.
bool u_isLowerCase( int index ) – Returns true if the code point at position index is a lower case letter character (equivalent to java.lang.Character.isLowerCase()). Throws exception if index is out of bounds.
bool u_isSpace( int index ) – Returns true if the code point at position index is a white space character; similar to C/POSIX isspace(). Throws exception if index is out of bounds.
bool u_isSpaceChar( int index ) – Returns true if the code point at position index is a space character (equivalent to java.lang.Character.isSpaceChar()). Throws exception if index is out of bounds.
bool u_isWhitespace( int index ) – Returns true if the code point at position index is a white space character (similar to java.lang.Character.isWhitespace()). Throws exception if index is out of bounds.
bool u_isLetter( int index ) – Returns true if the code point at position index is a letter character (equivalent to java.lang.Character.isLetter()). Throws exception if index is out of bounds.
bool u_isDigit( int index ) – Returns true if the code point at position index is a digit character (equivalent to java.lang.Character.isDigit()). Throws exception if index is out of bounds.
bool u_isLetterOrDigit( int index ) – Returns true if the code point at position index is an alphanumeric character (letter or digit) (equivalent to java.lang.Character.isLetterOrDigit()). Throws exception if index is out of bounds.
bool u_isPunctuation( int index ) – Returns true if the code point at index is a punctuation character. Throws exception if index is out of bounds.
bool u_isBMP( int index ) – Returns true if the code point at index is in the Unicode Basic Multilingual Plane. Throws exception if index is out of bounds.
bool u_isGraphic( int index ) – Returns true if the code point at index is a "graphic" character (printable, excluding spaces). Throws exception if index is out of bounds.
bool u_isPrintable( int index ) – Returns true if the code point at index is a printable character. Throws exception if index is out of bounds.
bool u_isBlank( int index ) – Returns true if the code point at index is a "blank" or "horizontal space", a character that visibly separates words on a line. Throws exception if index is out of bounds.
bool u_isDefined( int index ) – Returns true if the code point at index is "defined", which usually means it is assigned a character in Unicode (equivalent to java.lang.Character.isDefined()). Throws exception if index is out of bounds.
int u_charType( int index ) – Returns the Unicode general category of the code point at position index (equivalent to java.lang.Character.getType()). Return value can be tested against UnicodeCharTypeConstants. Throws exception if index is out of bounds. See UnicodeCharTypeConstants.
int u_digit( int index, int radix ) – Returns the decimal digit value of the code point at position index in the specified radix. Throws exception if index is out of bounds.
void u_toLower( int index ) – Changes code point at index to lower case using non-Locale-based Unicode mapping. Throws exception if index is out of bounds.
void u_toUpper( int index ) – Changes code point at index to upper case using non-Locale-based Unicode mapping. Throws exception if index is out of bounds.