Regular Expressions Functions and Operators¶
A regular expression is a powerful way to specify a pattern for a complex search. The functions and operators described in this section performs regular expression matching on a string.
Regular Expressions Pattern Syntax¶
To implement regular expressions, CUBRID uses the Google RE2 library and the C++ <regex> standard library. You can select the regular expression library to use by setting the regexp_engine system parameter, and the Google RE2 library is used as a default.
The following sub-sections describes supported regular expression grammars with several examples.
Note
Compatibility Considerations
In the prior version of CUBRID 11, CUBRID used Henry Spencer’s implementation of regular expressions.
From the CUBRID 11.0, CUBRID removes the Henry Spencer library and uses C++ <regex> standard library to support regular expression functions and operators.
From the CUBRID 11.2, CUBRID has added the Google RE2 library. Either C++ <regex> or Google RE2 can be used by setting the system parameter.
1. The Henry Spencer’s implementation of regular expressions operates in byte-wise fashion. So the REGEXP and RLIKE were not multibyte safe, they only worked as ASCII encoding without considering the collation of operands.
2. The Spencer library supports the POSIX collating sequence expressions ([.character.]). But it does not support anymore. Also, character equivalents ([=word=]) does not support. CUBRID occurs an error when these collating element syntax is given.
3. The Spencer library matches line-terminator characters for the dot operator (.). But it does not.
4. The word-beginning and word-end boundary ([[:<:]] and [[:>:]]) doesn’t support anymore. Instead, the word boundary notation (\b) can be used.
Warning
C++ <regex> library caveats
There is an issue with the C++ <regex> library that can cause excessive recursive calls when the input string is long or the regular expression pattern is complex. Therefore, it is recommended to use Google RE2 instead of C++ <regex>. C++ <regex> is reserved for backwards compatibility and has been deprecated.
Note
Multibyte Character Comparision Considerations
C++ <regex> performs multibyte comparision by C++ <locale> standard dependent on system-supplied locales. Therefore, system locale should be installed on your system for locale-sensitive functions.
Special Pattern Characters¶
Special pattern characters are characters (or sequences of characters) that have a special meaning when they appear in a regular expression pattern, either to represent a character that is difficult to express in a string, or to represent a category of characters. Each of these special pattern characters is matched in a string against a single character (unless a quantifier specifies otherwise).
Characters |
Description |
---|---|
. |
Any character except line terminators (LF, CR, LS, PS). |
\t |
A horizontal tab character (same as \u0009). |
\n |
A newline (line feed) character (same as \u000A). |
\v |
A vertical tab character (same as \u000B). |
\f |
A form feed character (same as \u000C). |
\r |
A carriage return character (same as \u000D) |
\cletter |
A control code character whose code unit value is the same as the remainder of dividing the code unit value of letter by 32. |
\xhh |
A a character whose code unit value has an hex value equivalent to the two hex digits hh. |
\uhhhh |
A character whose code unit value has an hex value equivalent to the four hex digits hhhh. Only supported in C++ <regex> |
\x{digits} |
A hex character code corresponds to the value of digits Only supported in Google RE2 |
\0 |
A null character (same as \u0000). Only supported in C++ <regex> |
\num |
The result of the submatch whose opening parenthesis is the num-th. See groups below for more info. |
\d |
A decimal digit character (same as [[:digit:]]). |
\D |
Any character that is not a decimal digit character (same as [^[:digit:]]). |
\s |
A whitespace character (same as [[:space:]]). |
\S |
Any character that is not a whitespace character (same as [^[:space:]]). |
\w |
An alphanumeric or underscore character (same as [_[:alnum:]]). |
\W |
Any character that is not an alphanumeric or underscore character (same as [^_[:alnum:]]). |
\character |
The character character as it is, without interpreting its special meaning within a regex expression.
Any character can be escaped except those which form any of the special character sequences above.
Needed for: ^ $ \ . * + ? ( ) [ ] { } |
|
[class] |
A string is part of the class. see POSIX-based character classes. |
[^class] |
A string is not part of the class. see POSIX-based character classes. |
\p{class} |
A string is range of Unicode range of class. see Unicode Character Class. |
-- .: match any character
SELECT ('cubrid dbms' REGEXP '^c.*$');
('cubrid dbms' regexp '^c.*$')
================================
1
To match special characters such as “\n”, “\t”, “\r”, and “\\”, some must be escaped with the backslash (\) by specifying the value of no_backslash_escapes (default: yes) to no. For details on no_backslash_escapes, see Escape Special Characters.
-- \n : match a special character, when no_backslash_escapes=yes (default)
SELECT ('new\nline' REGEXP 'new\\nline');
('new\nline' REGEXP 'new\\nline');
=====================================
1
-- \n : match a special character, when no_backslash_escapes=no
SELECT ('new\nline' REGEXP 'new
line');
('new
line' regexp 'new
line')
=====================================
0
Quantifiers¶
Quantifiers follow a character or a special pattern character. They can modify the amount of times that character is repeated in the match:
Characters |
Description |
---|---|
* |
The preceding is matched 0 or more times. |
+ |
The preceding is matched 1 or more times. |
? |
The preceding is optional (matched either 0 times or once). |
{num} |
The preceding is matched exactly num times. |
{num,} |
The preceding is matched num or more times. |
{min,max} |
The preceding is matched at least min times, but not more than max. |
-- a+ : match any sequence of one or more a characters. case insensitive.
SELECT ('Aaaapricot' REGEXP '^A+pricot');
('Aaaapricot' regexp '^A+pricot')
================================
1
-- a? : match either zero or one a character.
SELECT ('Apricot' REGEXP '^Aa?pricot');
('Apricot' regexp '^Aa?pricot')
==========================
1
SELECT ('Aapricot' REGEXP '^Aa?pricot');
('Aapricot' regexp '^Aa?pricot')
===========================
1
SELECT ('Aaapricot' REGEXP '^Aa?pricot');
('Aaapricot' regexp '^Aa?pricot')
============================
0
-- (cub)* : match zero or more instances of the sequence abc.
SELECT ('cubcub' REGEXP '^(cub)*$');
('cubcub' regexp '^(cub)*$')
==========================
1
By default, all these quantifiers perform in a greedy way which takes as many characters that meet the condition as possible. And this behavior can be overridden to non-greedy by adding a question mark (?) after the quantifier.
-- (a+), (a+?) : match with quantifiers performs greedy and ungreedy respectively.
SELECT REGEXP_SUBSTR ('aardvark', '(a+)'), REGEXP_SUBSTR ('aardvark', '(a+?)');
regexp_substr('aardvark', '(a+)') regexp_substr('aardvark', '(a+?)')
============================================
'aa' 'a'
Groups¶
Groups allow to apply quantifiers to a sequence of characters (instead of a single character). There are two kinds of groups:
Warning
a backreference syntax using $int is supported only in C++ <regex>.
Characters |
Description |
---|---|
(subpattern) |
Group which creates a backreference. |
(?:subpattern) |
Passive group which does not create a backreference. |
-- The captured group can be referenced with $int
SELECT REGEXP_REPLACE ('hello cubrid','([[:alnum:]]+)','$1!');
regexp_replace('hello cubrid','([[:alnum:]]+)','$1!')
==========================
'hello! cubrid!'
When a group creates a backreference, the characters that represent the subpattern in a string are stored as a submatch. Each submatch is numbered after the order of appearance of their opening parenthesis (the first submatch is number 1, the second is number 2, and so on…). These submatches can be used in the regular expression itself to specify that the entire subpattern should appear again somewhere else (see int in the special characters list). They can also be used in the replacement string in the REGEXP_REPLACE function.
-- performs regexp_substr without groups. the following is the case that fully matched.
SELECT REGEXP_SUBSTR ('abckabcjabc', '[a-c]{3}k[a-c]{3}j[a-c]{3}');
-- ([a-c]{3}) creates a backreference, \1
SELECT REGEXP_SUBSTR ('abckabcjabc', '([a-c]{3})k\1j\1');
regexp_substr('abckabcjabc', '[a-c]{3}k[a-c]{3}j[a-c]{3}')
======================
'abckabcjabc'
regexp_substr('abckabcjabc', '([a-c]{3})k\1j\1')
======================
'abckabcjabc'
Assertions¶
Assertions are conditions that do not consume characters in a string: they do not describe a character, but a condition that must be fulfilled before or after a character.
Characters |
Description |
---|---|
^ |
The beginning of a string, or follows a line terminator |
$ |
The end of a string, or precedes a line terminator |
Note
The following syntax is supported only in C++ <regex>.
Characters |
Description |
---|---|
\b |
The previous character is a word character and the next is a non-word character (or vice-versa). |
\B |
The previous and next characters are both word characters or both are non-word characters. |
(?=subpattern) |
Positive lookahead. The characters following the character must match subpattern, but no characters are consumed. |
(?!subpattern) |
Negative lookahead. The characters following the assertion must not match subpattern, but no characters are consumed. |
-- ^ : match the beginning of a string
SELECT ('cubrid dbms' REGEXP '^cub');
('cubrid dbms' regexp '^cub')
===============================
1
-- $ : match the end of a string
SELECT ('this is cubrid dbms' REGEXP 'dbms$');
('this is cubrid dbms' regexp 'dbms$')
========================================
1
-- (?=subpattern): positive lookahead
SELECT REGEXP_REPLACE ('cubrid dbms cubrid sql cubrid rdbms', 'cubrid(?= sql)', 'CUBRID');
-- (?!subpattern): nagative lookahead
SELECT REGEXP_REPLACE ('cubrid dbms cubrid sql cubrid rdbms', 'cubrid(?! sql)', 'CUBRID');
regexp_replace('cubrid dbms cubrid sql cubrid rdbms', 'cubrid(?= sql)', 'CUBRID')
======================
'cubrid dbms CUBRID sql cubrid rdbms'
regexp_replace('cubrid dbms cubrid sql cubrid rdbms', 'cubrid(?! sql)', 'CUBRID')
======================
'CUBRID dbms cubrid sql CUBRID rdbms'
Alternatives¶
A pattern can include different alternatives:
Characters |
Description |
---|---|
| |
Separates two alternative patterns or subpatterns. |
-- a|b : matches any character that is either a or b.
SELECT ('a' REGEXP 'a|b');
SELECT ('d' REGEXP 'a|b');
('a' regexp 'a|b')
==============================
1
('d' regexp 'a|b')
==============================
0
A regular expression can contain multiple alternative patterns simply by separating them with the separator operator (|): The regular expression will match if any of the alternatives match, and as soon as one does. Subpatterns (in groups or assertions) can also use the separator operator to separate different alternatives.
-- a|b|c : matches any character that is either a, b or c.
SELECT ('a' REGEXP 'a|b|c');
SELECT ('d' REGEXP 'a|b|c');
('a' regexp 'a|b|c')
==============================
1
('d' regexp 'a|b|c')
==============================
0
Character classes¶
Character classes syntax matches one of characters or a category of characters within square brackets.
Individual characters¶
Any character specified is considered part of the class (except the characters \, [, ]).
-- [abc] : matches any character that is either a, b or c.
SELECT ('a' REGEXP '[abc]');
SELECT ('d' REGEXP '[abc]');
('a' regexp '[abc]')
==============================
1
('d' regexp '[abc]')
==============================
0
Ranges¶
To represent a range of characters, use the dash character (-) between two valid characters. For example, “[a-z]” matches any alphabet letter whereas “[0-9]” matches any single number.
SELECT ('adf' REGEXP '[a-f]');
SELECT ('adf' REGEXP '[g-z]');
('adf' regexp '[a-f]')
================================
1
('adf' regexp '[g-z]')
================================
0
-- [0-9]+: matches number sequence in a string
SELECT REGEXP_SUBSTR ('aas200gjb', '[0-9]+');
regexp_substr('aas200gjb', '[0-9]+')
======================
'200'
SELECT ('strike' REGEXP '^[^a-dXYZ]+$');
('strike' regexp '^[^a-dXYZ]+$')
================================
1
POSIX-based character classes¶
The POSIX-based character class ([:classname:]) defines categories of characters as shown below.
Note
Google RE2 matches only ASCII characters, and C++ <regex> matches Unicode characters as well.
Class |
Description |
---|---|
[:alnum:] |
Alpha-numerical character |
[:alpha:] |
Alphabetic character |
[:blank:] |
Blank character |
[:cntrl:] |
Control character |
[:digit:] |
Decimal digit character |
[:graph:] |
Character with graphical representation |
[:lower:] |
Lowercase letter |
[:print:] |
Printable character |
[:punct:] |
Punctuation mark character |
[:space:] |
Whitespace character |
[:upper:] |
Uppercase letter |
[:xdigit:] |
Hexadecimal digit character |
[:d:], [:w:] and [:s:] are an extension to the ECMAScript grammar. Only available in C++ <regex>.
Class |
Description |
---|---|
[:d:] |
Decimal digit character (0-9) |
[:w:] |
Word character |
[:s:] |
Whitespace character |
SELECT REGEXP_SUBSTR ('Samseong-ro 86-gil, Gangnam-gu, Seoul 06178', '[[:digit:]]{5}');
regexp_substr('Samseong-ro 86-gil, Gangnam-gu, Seoul 06178', '[[:digit:]]{5}')
======================
'06178'
-- ;set regexp_engine=cppstd
SET NAMES utf8 COLLATE utf8_ko_cs;
SELECT REGEXP_REPLACE ('가나다 가나 가나다라', '\b([[:alpha:]]{2})\b', '#');
regexp_replace('가나다 가나 가나다라' collate utf8_ko_cs, '\b([[:alpha:]]{2})\b' collate utf8_ko_cs, '#' collate utf8_ko_cs)
======================
'가나다 # 가나다라'
Unicode Character Class¶
Unicode character classes (\p{classname}) are only supported by Google RE2. The Unicode class name can be specified as shown in the table below. For example, to match Korean characters, you can use \p{Hangul}.
Unicode Character Class |
---|
Adlam |
Ahom |
Anatolian_Hieroglyphs |
Arabic |
Armenian |
Avestan |
Balinese |
Bamum |
Bassa_Vah |
Batak |
Bengali |
Bhaiksuki |
Bopomofo |
Brahmi |
Braille |
Buginese |
Buhid |
Canadian_Aboriginal |
Carian |
Caucasian_Albanian |
Chakma |
Cham |
Cherokee |
Chorasmian |
Common |
Coptic |
Cuneiform |
Cypriot |
Cypro_Minoan |
Cyrillic |
Deseret |
Devanagari |
Dives_Akuru |
Dogra |
Duployan |
Egyptian_Hieroglyphs |
Elbasan |
Elymaic |
Ethiopic |
Georgian |
Glagolitic |
Gothic |
Grantha |
Greek |
Gujarati |
Gunjala_Gondi |
Gurmukhi |
Han |
Hangul |
Hanifi_Rohingya |
Hanunoo |
Hatran |
Hebrew |
Hiragana |
Imperial_Aramaic |
Inherited |
Inscriptional_Pahlavi |
Inscriptional_Parthian |
Javanese |
Kaithi |
Kannada |
Katakana |
Kawi |
Kayah_Li |
Kharoshthi |
Khitan_Small_Script |
Khmer |
Khojki |
Khudawadi |
Lao |
Latin |
Lepcha |
Limbu |
Linear_A |
Linear_B |
Lisu |
Lycian |
Lydian |
Mahajani |
Makasar |
Malayalam |
Mandaic |
Manichaean |
Marchen |
Masaram_Gondi |
Medefaidrin |
Meetei_Mayek |
Mende_Kikakui |
Meroitic_Cursive |
Meroitic_Hieroglyphs |
Miao |
Modi |
Mongolian |
Mro |
Multani |
Myanmar |
Nabataean |
Nag_Mundari |
Nandinagari |
New_Tai_Lue |
Newa |
Nko |
Nushu |
Nyiakeng_Puachue_Hmong |
Ogham |
Ol_Chiki |
Old_Hungarian |
Old_Italic |
Old_North_Arabian |
Old_Permic |
Old_Persian |
Old_Sogdian |
Old_South_Arabian |
Old_Turkic |
Old_Uyghur |
Oriya |
Osage |
Osmanya |
Pahawh_Hmong |
Palmyrene |
Pau_Cin_Hau |
Phags_Pa |
Phoenician |
Psalter_Pahlavi |
Rejang |
Runic |
Samaritan |
Saurashtra |
Sharada |
Shavian |
Siddham |
SignWriting |
Sinhala |
Sogdian |
Sora_Sompeng |
Soyombo |
Sundanese |
Syloti_Nagri |
Syriac |
Tagalog |
Tagbanwa |
Tai_Le |
Tai_Tham |
Tai_Viet |
Takri |
Tamil |
Tangsa |
Tangut |
Telugu |
Thaana |
Thai |
Tibetan |
Tifinagh |
Tirhuta |
Toto |
Ugaritic |
Vai |
Vithkuqi |
Wancho |
Warang_Citi |
Yezidi |
Yi |
Zanabazar_Square |
-- ;set regexp_engine=re2
SELECT REGEXP_COUNT('가나 가나다라 마바사아 자차카타 파하', '\p{Hangul}+');
regexp_count(_utf8'가나 가나다라 마바사아 자차카타 파하' collate utf8_ko_cs, _utf8'\p{Hangul}+' collate utf8_ko_cs)
==============================
5
REGEXP, RLIKE¶
The REGEXP and RLIKE are used interchangeably. It performs a regular expression matcinh of a string. In the below syntax, if expression matches pattern, 1 is returned; otherwise, 0 is returned. If either expression or pattern is NULL, NULL is returned. The second syntax has the same meaning as the third syntax, which both syntaxes are using NOT.
expression REGEXP | RLIKE [BINARY] pattern
expression NOT REGEXP | RLIKE pattern
NOT (expression REGEXP | RLIKE pattern)
expression : Column or input expression
pattern : Pattern used in regular expressions
The difference between REGEXP and LIKE are as follows:
The LIKE operator succeeds only if the pattern matches the entire value.
The REGEXP operator succeeds if the pattern matches anywhere in the value. To match the entire value, you should use “^” at the beginning and “$” at the end.
The LIKE operator is case sensitive, but patterns of regular expressions in REGEXP is not case sensitive. To enable case sensitive, you should use REGEXP BINARY statement.
-- [a-dX] : matches any character that is either a, b, c, d or X.
SELECT ('aXbc' REGEXP '[a-dX]');
('aXbc' regexp '[a-dX]')
==============================
1
-- When REGEXP is used in SELECT list, enclosing this with parentheses is required.
-- But used in WHERE clause, no need parentheses.
-- case insensitive, except when used with BINARY.
SELECT name FROM public.athlete where name REGEXP '^[a-d]';
name
======================
'Dziouba Irina'
'Dzieciol Iwona'
'Dzamalutdinov Kamil'
'Crucq Maurits'
'Crosta Daniele'
'Bukovec Brigita'
'Bukic Perica'
'Abdullayev Namik'
REGEXP_COUNT¶
- REGEXP_COUNT(string, pattern_string[, position[, match_type]])¶
The REGEXP_COUNT function returns the number of occurrences of the regular expression pattern, pattern_string, within a given character string, string. If NULL is specified as an argument, NULL is returned.
- Parameters:
string – Specifies the original string. If the value is NULL, NULL is returned.
pattern_string – Specifies the regular expression pattern string to be searched. If the value is NULL, NULL is returned.
position – Specifies the position of the string to start the search. If the value is omitted, the default value 1 is applied. If the value is negative or zero, an error will be returned. If the value is NULL, NULL is returned
match_type – Specifies the string to change default matching behavior of the function. If the value is omitted, the default value ‘i’ is applied. If the value is other than ‘c’ or ‘i’, an error will be returned. If the value is NULL, NULL is returned.
- Return type:
INT
-- it returns NULL when an argument is specified with NULL value
SELECT REGEXP_COUNT('ab123ab111a','[a-d]+',NULL);
regexp_count('ab123ab111a','[a-d]+',NULL)
======================
NULL
-- an empty string pattern doesn't match with any string
SELECT REGEXP_COUNT('ab123ab111a','');
regexp_count('ab123ab111a','')
======================
0
SELECT REGEXP_COUNT('ab123Ab111aAA','[a-d]', 3);
regexp_count('ab123Ab111aAA', '[a-d]', 3)
===========================================
5
-- case insensitive ('i') is the default value
SELECT REGEXP_COUNT('ab123Ab111aAA','[a-d]', 3, 'i');
-- If case sensitive ('c') is specified as match_type, A is not matched.
SELECT REGEXP_COUNT('ab123Ab111aAA','[a-d]', 3, 'c');
regexp_count('ab123Ab111aAA', '[a-d]', 3, 'i')
================================================
5
regexp_count('ab123Ab111aAA', '[a-d]', 3, 'c')
================================================
2
SET NAMES utf8 COLLATE utf8_ko_cs;
SELECT REGEXP_COUNT('가나123abc가다abc가가','[가-나]+');
regexp_count('가나123abc가다abc가가' collate utf8_ko_cs, '[가-나]+' collate utf8_ko_cs)
=================================================================================================
3
REGEXP_INSTR¶
- REGEXP_INSTR(string, pattern_string[, position[, occurrence[, return_option[, match_type]]]])¶
The REGEXP_INSTR function returns the beginning or ending position by searching for a regular expression pattern, pattern_string, within a given character string, string. If NULL is specified as an argument, NULL is returned.
- Parameters:
string – Specifies the original string. If the value is NULL, NULL is returned.
pattern_string – Specifies the regular expression pattern string to be searched. If the value is NULL, NULL is returned.
position – Specifies the position of the string to start the search. If the value is omitted, the default value 1 is applied. If the value is negative or zero, an error will be returned. If the value is NULL, NULL is returned
occurrence – Specifies the occurrence of the match to use. If the value is omitted, the default value 1 is applied. If the value is negative, an error will be returned. If the value is NULL, NULL is returned.
return_option – Specifies whether to return the position of the start or end of the matched string. If the value is 0, the position of the first character of the match is returned. If the value is 0, the starting position of the matched string is returned. If the value is 1, the end position of the matched string is returned. If the value is other than 0 or 1, an error will be returned. If the value is NULL, NULL is returned.
match_type – Specifies the string to change default matching behavior of the function. If the value is omitted, the default value ‘i’ is applied. If the value is other than ‘c’ or ‘i’, an error will be returned. If the value is NULL, NULL is returned.
- Return type:
INT
-- it returns NULL when an argument is specified with NULL value
SELECT REGEXP_INSTR('12345abcdeabcde','[abc]',NULL);
regexp_instr('12345abcdeabcde', '[abc]', null)
======================
NULL
-- an empty string pattern doesn't match with any string
SELECT REGEXP_INSTR('12345abcdeabcde','');
regexp_instr('12345abcdeabcde', '')
======================
0
-- it returns the position of the first character of the match.
SELECT REGEXP_INSTR('12354abc5','[:alpha:]+',1,1,0);
regexp_instr('12354abc5','[:alpha:]+', 1, 1, 0);
======================
6
-- it returns the position of the character following the match.
SELECT REGEXP_INSTR('12354abc5','[:alpha:]+',1,1,1);
regexp_instr('12354abc5','[:alpha:]+', 1, 1, 1);
======================
7
SET NAMES utf8 COLLATE utf8_ko_cs;
SELECT REGEXP_INSTR('12345가나다라마가나다라마바','[가-다]+');
regexp_instr('12345가나다라마가나다라마바' collate utf8_ko_cs, '[가-다]+' collate utf8_ko_cs)
============================================================================================================
6
REGEXP_LIKE¶
- REGEXP_LIKE(string, pattern_string[, match_type])¶
The REGEXP_LIKE function searches for a regular expression pattern, pattern_string, within a given character string, string. If the pattern matched anywhere in the string, 1 is returned. Otherwise, 0 is returned. If NULL is specified as an argument, NULL is returned.
- Parameters:
string – Specifies the original string. If the value is NULL, NULL is returned.
pattern_string – Specifies the regular expression pattern string to be searched. If the value is NULL, NULL is returned.
match_type – Specifies the string to change default matching behavior of the function. If the value is omitted, the default value ‘i’ is applied. If the value is other than ‘c’ or ‘i’, an error will be returned. If the value is NULL, NULL is returned.
- Return type:
INT
SELECT REGEXP_LIKE('abbbbc','ab+c');
regexp_like('abbbbc', 'ab+c');
======================
1
-- an empty string pattern doesn't match with any string
SELECT REGEXP_LIKE('abbbbc','');
regexp_like('abbbbc', '');
======================
0
SELECT REGEXP_LIKE('abbbbc','AB+C', 'c');
regexp_like('abbbbc', 'AB+C');
======================
0
SET NAMES utf8 COLLATE utf8_ko_cs;
SELECT REGEXP_LIKE('가나다','가나?다');
SELECT REGEXP_LIKE('가나라다','가나?다');
regexp_like('가나다' collate utf8_ko_cs, '가나?다' collate utf8_ko_cs)
==============================================================================
1
regexp_like('가나라다' collate utf8_ko_cs, '가나?다' collate utf8_ko_cs)
=================================================================================
0
REGEXP_REPLACE¶
- REGEXP_REPLACE(string, pattern_string, replacement_string[, position[, occurrence[, match_type]]])¶
The REGEXP_REPLACE function searches for a regular expression pattern, pattern_string, within a given character string, string, and replaces it with a character string, replacement_string. If NULL is specified as an argument, NULL is returned.
- Parameters:
string – Specifies the original string. If the value is NULL, NULL is returned.
pattern_string – Specifies the regular expression pattern string to be searched. If the value is NULL, NULL is returned.
replacement_string – Specifies the string to replace the matched string by pattern_string. If the value is NULL, NULL is returned.
position – Specifies the position of the string to start the search. If the value is omitted, the default value 1 is applied. If the value is negative or zero, an error will be returned. If the value is NULL, NULL is returned
occurrence – Specifies the occurrence of the pattern to use. If the value is omitted, the default value 0 is applied. If the value is negative, an error will be returned. If the value is NULL, NULL is returned.
match_type – Specifies the string to change default matching behavior of the function. If the value is omitted, the default value ‘i’ is applied. If the value is other than ‘c’ or ‘i’, an error will be returned. If the value is NULL, NULL is returned.
- Return type:
STRING
-- it returns NULL when an argument is specified with NULL value
SELECT REGEXP_REPLACE('12345abcdeabcde','[a-d]',NULL);
regexp_replace('12345abcdeabcde', '[a-d]', null)
======================
NULL
-- an empty string pattern doesn't match with any string
SELECT REGEXP_REPLACE('12345abcdeabcde','','#');
regexp_replace('12345abcdeabcde', '', '#')
======================
'12345abcdeabcde'
SELECT REGEXP_REPLACE('12345abDEKBcde','[a-d]','#');
regexp_replace('12345abDEKBcde', '[a-d]', '#')
======================
'12345###EK###e'
-- case insensitive ('i') is the default value
SELECT REGEXP_REPLACE('12345abDEKBcde','[a-d]','#', 1, 0, 'i');
-- match_type is specified as case sensitive ('c'). 'B' and 'D' are not matched.
SELECT REGEXP_REPLACE('12345abDEKBcde','[a-d]','#', 1, 0, 'c');
regexp_replace('12345abDEKBcde', '[a-d]', '#', 1, 0, 'i')
======================
'12345###EK###e'
regexp_replace('12345abDEKBcde', '[a-d]', '#', 1, 0, 'c')
======================
'12345##DEKB##e'
SET NAMES utf8 COLLATE utf8_ko_cs;
SELECT REGEXP_REPLACE('a1가b2나다라','[가-다]','#',6);
regexp_replace('a1가b2나다라' collate utf8_ko_cs, '[가-다]' collate utf8_ko_cs, '#' collate utf8_ko_cs, 6)
======================
'a1가b2##라'
REGEXP_SUBSTR¶
- REGEXP_SUBSTR(string, pattern_string[, position[, occurrence[, match_type]]])¶
The REGEXP_SUBSTR function extracts a character string matched for a regular expression pattern, pattern_string, within a given character string, string. If NULL is specified as an argument, NULL is returned.
- Parameters:
string – Specifies the original string. If the value is NULL, NULL is returned.
pattern_string – Specifies the regular expression pattern string to be searched. If the value is NULL, NULL is returned.
position – Specifies the position of the string to start the search. If the value is omitted, the default value 1 is applied. If the value is negative or zero, an error will be returned. If the value is NULL, NULL is returned
occurrence – Specifies the occurrence of the pattern to use. If the value is omitted, the default value 0 is applied. If the value is negative, an error will be returned. If the value is NULL, NULL is returned.
match_type – Specifies the string to change default matching behavior of the function. If the value is omitted, the default value ‘i’ is applied. If the value is other than ‘c’ or ‘i’, an error will be returned. If the value is NULL, NULL is returned.
- Return type:
STRING
-- if pattern is not matched, null is returned
SELECT REGEXP_SUBSTR('12345abcdeabcde','[k-z]+');
regexp_substr('12345abcdeabcde','[k-z]+');
======================
NULL
-- an empty string pattern doesn't match with any string
SELECT REGEXP_SUBSTR('12345abcdeabcde','');
regexp_substr('12345abcdeabcde', '')
======================
NULL
SELECT REGEXP_SUBSTR('Samseong-ro, Gangnam-gu, Seoul',',[^,]+,');
regexp_substr('Samseong-ro, Gangnam-gu, Seoul', ',[^,]+,')
======================
', Gangnam-gu,'
SET NAMES utf8 COLLATE utf8_ko_cs;
SELECT REGEXP_SUBSTR('삼성로, 강남구, 서울특별시','\p{Hangul}+',1,2);
regexp_substr('삼성로, 강남구, 서울특별시' collate utf8_ko_cs, '\p{Hangul}+' collate utf8_ko_cs, 1, 2)
======================
'강남구'