Skip to content

String Functions

String functions are used to manipulate string data.

CONCAT

text
concat(col1, col2, ...)

Concatenates arrays or strings. This function accepts any number of arguments and returns a string or an array.

ENDSWITH

text
endswith(col1, col2)

Returns a boolean indicating whether the first string argument ends with the second string argument.

FORMAT_TIME

text
format_time(col, format)

Format a datetime to string. The 'col' will be cast to datetime type if it is bigint, float or string type before formatting. Please check format patterns for how to compose the format.

Format_time patterns

A pattern is used to create a format string. Patterns are based on a simple sequence of letters and symbols which is common in many languages like Java, etc. The supported symbols in Kuiper are

SymbolMeaningExample
GeraG(AD)
YyearYYYY(2004), YY(04)
MmonthM(1), MM(01), MMM(Jan), MMMM(January)
dday of monthd(2), dd(02)
Eday of weekEEE(Mon), EEEE(Monday)
Hhour in 24 hours formatHH(15)
hhour in 12 hours formath(2), hh(03)
aAM or PMa(PM)
mminutem(4), mm(04)
sseconds(5), ss(05)
Sfraction of secondS(.0), SS(.00), SSS(.000)
ztime zone namez(MST)
Z4 digits time zone offsetZ(-0700)
Xtime zone offsetX(-07), XX(-0700), XXX(-07:00)
\Escape character\Z(Z) \X(X)

Examples:

  • YYYY-MM-dd T HH:mm:ss -> 2006-01-02 T 15:04:05
  • YYYY/MM/dd HH:mm:ssSSS XXX -> 2006/01/02 15:04:05.000 -07:00
  • yyyy-MM-ddTHH:mm:ssSS\ZXX -> 2006-01-02T15:04:05.00Z-0700

INDEXOF

text
indexof(col1, col2)

Returns the first index (0-based) of the second argument as a substring in the first argument.

LENGTH

text
length(col)

Returns the number of characters in the provided string.

LOWER

text
lower(col)

Returns the lowercase version of the given string.

LPAD

text
lpad(col, 2)

Returns the string argument, padded on the left side with the number of spaces specified by the second argument. Notice that, if the second argument is big, the string will take up a lot of memory. Avoid using a big number as the second argument, for example, use where clause to filter SELECT lpad(col, len) from source WHERE len < 999

LTRIM

text
ltrim(col)

Removes all leading whitespace (tabs and spaces) from the provided string.

NUMBYTES

text
numbytes(col)

Returns the number of bytes in the UTF-8 encoding of the provided string.

REGEXP_MATCHES

text
regexp_matches(col, regex)

Returns true if the string (first argument) contains a match for the regular expression.

REGEXP_REPLACE

text
regexp_replace(col, regex, replacement)

Replaces all substrings of the specified string value that matches regexp with replacement.

REGEXP_SUBSTRING

text
regexp_substring(col, regex)

Returns the first substring of the specified string value that matches regexp.

RPAD

text
rpad(col, 2)

Returns the string argument, padded on the right side with the number of spaces specified by the second argument. Notice that, if the second argument is big, the string will take up a lot of memory. Avoid using a big number as the second argument, for example, use where clause to filter SELECT rpad(col, len) from source WHERE len < 999999

RTRIM

text
rtrim(col)

Removes all trailing whitespace (tabs and spaces) from the provided string.

SUBSTRING

text
substring(col, start, length)

Returns the substring of the specified string value starting at the specified index position (0-based, inclusive) for up to the specified length of characters.

STARTSWITH

text
startswith(col, str)

Returns a boolean indicating whether the first string argument starts with the second string argument.

SPLIT_VALUE

text
split_value(col, str_splitter, index)

Split the value of the 1st parameter with the 2nd parameter, and return the value of split array that indexed with the 3rd parameter.

For example, split_value("/test/device001/message","/",0) AS a, the returned value of function is empty;

split_value("/test/device001/message","/",3) AS a, the returned value of function is message.

TRIM

text
trim(col)

Removes all leading and trailing whitespace (tabs and spaces) from the provided string.

UPPER

text
upper(col)

Returns the uppercase version of the given string.

FORMAT

text
format(col,D[,locale])

Formats the number X to a format like '#######.##', rounded to D decimal places, and returns the result as a string, The optional third parameter enables a locale to be specified to be used for the result number's decimal point.

sql
SELECT format(12332.1234567, 4,"en_US");
-> '12,332.1235'

Depending on the region, the display format of numbers can vary greatly. For example, here are examples to format the number 123456.78 specifically for certain regions:

Locale ValueFormat
en_US123,456.78
de_DE123.456,78
de_CH123'456.78

More regional options:

Locale ValueMeaning
ar_AEArabic - United Arab Emirates
ar_BHArabic - Bahrain
ar_DZArabic - Algeria
ar_EGArabic - Egypt
ar_INArabic - India
ar_IQArabic - Iraq
ar_JOArabic - Jordan
ar_KWArabic - Kuwait
ar_LBArabic - Lebanon
ar_LYArabic - Libya
ar_MAArabic - Morocco
ar_OMArabic - Oman
ar_QAArabic - Qatar
ar_SAArabic - Saudi Arabia
ar_SDArabic - Sudan
ar_SYArabic - Syria
ar_TNArabic - Tunisia
ar_YEArabic - Yemen
be_BYBelarusian - Belarus
bg_BGBulgarian - Bulgaria
ca_ESCatalan - Spain
cs_CZCzech - Czech Republic
da_DKDanish - Denmark
de_ATGerman - Austria
de_BEGerman - Belgium
de_CHGerman - Switzerland
de_DEGerman - Germany
de_LUGerman - Luxembourg
el_GRGreek - Greece
en_AUEnglish - Australia
en_CAEnglish - Canada
en_GBEnglish - United Kingdom
en_INEnglish - India
en_NZEnglish - New Zealand
en_PHEnglish - Philippines
en_USEnglish - United States
en_ZAEnglish - South Africa
en_ZWEnglish - Zimbabwe
es_ARSpanish - Argentina
es_BOSpanish - Bolivia
es_CLSpanish - Chile
es_COSpanish - Colombia
es_CRSpanish - Costa Rica
es_DOSpanish - Dominican Republic
es_ECSpanish - Ecuador
es_ESSpanish - Spain
es_GTSpanish - Guatemala
es_HNSpanish - Honduras
es_MXSpanish - Mexico
es_NISpanish - Nicaragua
es_PASpanish - Panama
es_PESpanish - Peru
es_PRSpanish - Puerto Rico
es_PYSpanish - Paraguay
es_SVSpanish - El Salvador
es_USSpanish - United States
es_UYSpanish - Uruguay
es_VESpanish - Venezuela
et_EEEstonian - Estonia
eu_ESBasque - Spain
fi_FIFinnish - Finland
fo_FOFaroese - Faroe Islands
fr_BEFrench - Belgium
fr_CAFrench - Canada
fr_CHFrench - Switzerland
fr_FRFrench - France
fr_LUFrench - Luxembourg
gl_ESGalician - Spain
gu_INGujarati - India
he_ILHebrew - Israel
hi_INHindi - India
hr_HRCroatian - Croatia
hu_HUHungarian - Hungary
id_IDIndonesian - Indonesia
is_ISIcelandic - Iceland
it_CHItalian - Switzerland
it_ITItalian - Italy
ja_JPJapanese - Japan
ko_KRKorean - Republic of Korea
lt_LTLithuanian - Lithuania
lv_LVLatvian - Latvia
mk_MKMacedonian - North Macedonia
mn_MNMongolia - Mongolian
ms_MYMalay - Malaysia
nb_NONorwegian(Bokmål) - Norway
nl_BEDutch - Belgium
nl_NLDutch - The Netherlands
no_NONorwegian - Norway
pl_PLPolish - Poland
pt_BRPortugese - Brazil
pt_PTPortugese - Portugal
rm_CHRomansh - Switzerland
ro_RORomanian - Romania
ru_RURussian - Russia
ru_UARussian - Ukraine
sk_SKSlovak - Slovakia
sl_SISlovenian - Slovenia
sq_ALAlbanian - Albania
sr_RSSerbian - Serbia
sv_FISwedish - Finland
sv_SESwedish - Sweden
ta_INTamil - India
te_INTelugu - India
th_THThai - Thailand
tr_TRTurkish - Turkey
uk_UAUkrainian - Ukraine
ur_PKUrdu - Pakistan
vi_VNVietnamese - Vietnam
zh_CNChinese - China
zh_HKChinese - Hong Kong
zh_TWChinese - Taiwan