The “NOT” functions

These group of functions returns the position of the first character value that is not a particular value.

As with the “ANY” functions, there is an optional parameter that specifies where to start the search and in which direction to search.

FIND, FINDC and FINDW

This pair of functions shares some similarities to the INDEX and INDEXC functions. FIND and INDEX both search a string for a given substring.

FINDC and INDEXC can be used to search for individual characters. However, both FIND and FINDC have some additional capability over their INEX and INDEXC.

In FIND and FINDC you can declare a starting position for the search, the direction of the search, and the ignore case or trailing blanks.

FIND

FIND is used to locate a substring within a string. By using arguments, you can define the starting point for the search, the direction of the search, and ignore the case or any trailing blanks.

Syntax:

FIND(character-value, find-string <,'modifiers'> <,start>)

The find-string argument is a character variable which contains one or more characters that you want to search for.

The function returns the first position in the character-value that contains the find-string. It returns 0 if the string is not found.

Modifiers:

i: ignore case.

ignore case. t: ignore trailing blanks in both the character variable and the find-string.

The start is an optional parameter that specifies the position in the string to begin the search. By default search starts at the beginning of the string.

If you specify a non-zero argument, the search begins at the position in the string of the absolute value of the number.

The search goes from left to right if the value is positive and right to left for negative values.

If you specify a negative value which is larger than the length of the string, the search begins from right to left, starting at the end of the string.

The function returns 0 if the value of start is a positive number or longer than the length of the string.

FINDC

FINDC locates a character that appears or does not appear within a string.

You can control the starting point for the search, the direction of the search, to ignore case or trailing blanks, or to look for characters except the ones listed by specifying optional arguments.

FIND is used to locate a substring within a string. By using arguments, you can define the starting point for the search, the direction of the search, and ignore the case or any trailing blanks.

Syntax:

FINDC(character-value, find-string <,'modifiers'> <,start>)

The function returns the first position in the character-value that contains one of the find-characters.

If none of the characters is found, the function returns a 0. With an optional argument, you can have the function return the position in a character string of a character that is not in the find-characters list.

Modifiers:

i : ignore case.

: ignore case. t : Using this argument will ignore the trailing blanks in both the character variable and the find-characters.

: Using this argument will ignore the trailing blanks in both the character variable and the find-characters. v : By specifying this argument you can count characters which are not in the list of find characters.

: By specifying this argument you can count characters which are not in the list of find characters. o: process the modifiers and find characters only once to a specific call to the function.

FINDW

Returns the character position of a word in a string, or returns the number of the word in a string.

FINDW is used to locate a substring within a string. By using arguments, you can define the starting point for the search, the direction of the search, and ignore the case or any trailing blanks.

SYNTAX : FINDW(string, word, start-position <, character(s) <, modifier(s)>>)

For the complete list of FINDW modifiers, you can refer to the SAS Documentation website.

Examples:

For these examples let STRING1 = “Hello hello goodbye” and STRING2 =”hello”

Functions Results FIND(STRING1, STRING2) 7 FIND(STRING1, STRING2, ‘I’) 1 FIND(“abcxyzabc”,”abc”,4) 7 FIND(STRING1, STRING2, “i”, -99) 7 FINDC(STRING1, STRING2) 5 FINDC(STRING1, STRING2, ‘i’) 1 FINDC(STRING1,”aple”,’vi’) 6 FINDC(“abcxyzabc”,”abc”,4); 4

INDEX, INDEXC, and INDEXW

These set of functions can be used to search for One or more characters in a string.

INDEXC searches for one or more individual characters and always searches from right to left.

INDEX

The INDEX function is used to locate the starting position of a substring in a string.

SYNTAX : INDEX(character-value,find-string)

INDEXC

It is used to search a character string for one or more characters.

SYNTAX : INDEXC(character-value, 'charr1','char2','char3',…)

INDEXW

To search a string for a word, defined as a group of letters separated on both ends by a word boundary (space, the beginning of a string, end of the string).

Note: Punctuations are not considered for word boundary while scanning.

SYNATX: INDEXW(character-value, find-string)

VERIFY

The VERIFY function is used to check if a string contains any unwanted values.

SYNTAX : VERIFY(character-value, verify-string)

VERIFY function returns the first position in the character-value which is not present in the verify string.

The function returns 0 if the character values do not contain any character other than the verify string.

Functions Results VERIFY(‘ABCXABD’, ‘ABCDE’) 4 (“X” is not in the verify string) VERIFY(“ABC “,”ABC”) 4 (position of the 1st blank) VERIFY(TRIM(“ABC “),”ABC”) 0 (no invalid characters)

SAS Character Functions that Extract Parts of Strings

The functions described in this section can extract parts of strings from a given string.

SUBSTR

The SUBSTR function is used to extract part of a string from a given string.

SYNTAX : SUBSTR(character-value,start,)

character-value is any SAS character expression.

The start is the starting position within the string. The length argument, if specified, is the number of characters to include in the length of the characters from the start position to the end of the string.

The length of the resulting variable will be the length of the character-value if the length is not previously assigned.

Functions Results SUBSTR(“ABC123XYZ”,4,2) 12 SUBSTR(“ABC123XYZ”,4) 123XYZ

SUBSTR (used on the left-hand side of the equal sign)

When used on the left-hand side of the equal sign, SUBSTR function can be used to place one or more characters into an existing string.

SYNTAX : SUBSTR(character-value,start,) = charcter-value

Character-value is any SAS character expression. The start is the starting position in a string where you want to place the length of the new character.

Length is the number of characters to be placed in that string. If length is omitted all the characters on the right-hand side of the equal sign replaces the characters in the character-value.

For this example let sample_str = "Pin Code 411014";

Functions Results SUBSTR(sample_str, 4, 5) = “:”; sample_str is now = Pin: 411014 SUBSTR(sample_str, 4) = “:”; sample_str is now = Pin:

SUBSTRN

This function serves the same purpose as the SUBSTR function with a few additional features. The starting position and the length arguments of the SUBSTRN function can be 0 or negative without causing an error.

Syntax:

SUBSTRN(character-value,start,)

SUBPAD

SUBPAD function returns a substring of the length specified in the argument.

Difference between SUBPAD and SUBSTR The difference between a SAS function and a call routine is that a SAS function performs a computation or system manipulation on arguments, and returns a value that can be used in an assignment statement or elsewhere in expressions. whereas using CALL routines is that you cannot use them in assignment statements or expressions. Syntax: SUBSTR(string, position,)

Example:

str="Hello, world"; str2=subpad(str,8,5); put str2=;

str2=world

CHAR

The CHAR function returns a single character from a specified position in a character string.

SYNTAX : CHAR(string,position)

FIRST

The FIRST function returns the first character in a given string having a length of 1. It returns a single blank if the length of the string is 0.

SYNTAX : FIRST(string)

Example:

str="Hello, world"; result=char(str,8); result2=first(str); put result=; put result2=;

result=W result2=H

SAS Character Functions That Join Two or More Strings Together

SAS has three call routines and four character functions that can concatenate character strings. Although you can use the || concatenation operator in combination with the STRIP, TRIM, or LEFT functions, these routines and functions make it much easier to put strings together and, if you wish, to place one or more separator characters between the strings.

An advantage of using the call routines over their corresponding functions is improved performance. For example, CALL CATS(A, C, E, G) is faster than R = CATS(A, C, E, G).

CALL CATS

CALL CATS is used to join two or more strings. It also removes both leading and trailing blanks before the concatenation takes place.

SYNTAX : CALL CATS(result-string,string-1)

CALL CATT

CALL CATT concatenates two or more strings, removing only the trailing blanks before the concatenation takes place.

SYNTAX : CALL CATT(result-string,,)

CALL CATX

CALL CATX is similar to CALL CATT. However, this has the facility pf placing a delimiter of your choice between each of the resultant string.

SYNTAX : CALL CATX(seperator,result-string,string-1,)

SAS Character Functions That Remove Blanks from Strings

There are times when you want to remove blanks from the beginning or end of a character string. There are two functions LEFT and RIGHT which shifts the characters to the beginning of the end of the string, respectively.

The TRIM, TRIMN, and STRIP functions are useful when you want to concatenate or compare strings which have leading or trailing blanks.

SAS LEFT Function

SAS LEFT function is used to left-align text values. It doesn’t remove the leading blanks instead it moves them to the end of the string. Thus, it doesn’t change the storage length of the variable, even when you assign the result of LEFT to a new variable.

The LEFT function is particularly useful if values were read with the $CHAR informat, which preserves leading blanks.

RIGHT Function

RIGHT is used to right-align a text string. If the length of a character variable has previously been defined and it contains trailing blanks, the RIGHT function will move the characters to the end of the string and pad with blanks to the beginning. So, the final length of the variable remains unchanged.

COMPBL

COMPBL is used to replace all occurrences of two or more blanks with a single blank character. The length of the result variable will be the length of the argument if it has not been defined.

SYNTAX : COMPBL(charcter-value)

COMPRESS

COMPRESS function is used to remove specific characters from a character value. Similar to COMPBL, If a length has not been previously assigned, the length of the resulting variable will be the length of the argument. Read more on the compress function.

SYNTAX : COMPRESS(character-value,)

The compress-list is an optional list of the characters you want to remove. Blanks are removed as default if no arguments are specified.

TRIM, TRIMN, and STRIP

TRIM and TRIMN both removes trailing blanks. STRIP returns both leading and trailing blanks.

TRIM

To remove trailing blanks from a character value. This is especially useful when you want to concatenate several strings together and each string may contain trailing blanks.

SYNTAX : TRIM(character-value)

The length of the variable returned by the TRIM function will be the same length as the argument unless the length of this variable has been previously defined.

If the result of the TRIM function is assigned to a variable with a length longer than the trimmed argument, the resulting variable will be padded with blanks.

TRIMN

To remove trailing blanks from a character value. This is especially useful when you want to concatenate several strings together and each string may contain trailing blanks.

SYNTAX : TRIMN(character-value)

SAS STRIP Function

You can use the SAS STRIP function to remove all the leading and trailing blanks from character variables or strings. STRIP(CHAR) is equivalent to TRIMN(LEFT(CHAR)).

SYNTAX : STRIP(character-value)

data RemoveBlanks; length y $ 10; string=" ABC D EF G H I "; x=TRIM(string); y=TRIM(string); TRIM='*'||TRIM(string)||'*'; TRIMN='*'||TRIMN(string)||'*'; STRIP='*'||STRIP(string)||'*'; LEFT='*'||LEFT(string)||'*'; RIGHT='*'||RIGHT(string)||'*'; TRIM_LEFT='*'||TRIM(LEFT(string))||'*'; TRIM_RIGHT='*'||TRIMN(RIGHT(string))||'*'; COMPRESS='*'||COMPRESS(string)||'*'; COMPBL='*'||COMPBL(string)||'*'; run;

TRIM TRIMN and STRIP

SAS Character Functions That Compare Strings (Exact and “Fuzzy” Comparisons)

The functions in this section allow you to compare strings that are exactly similar or not exact matches.

COMPARE

The COMPARE is used to compare two character strings. There are modifiers for the COMPARE function which you can use to remove leading blanks, ignore case, truncate the longer string to the length of the shorter string, and strip quotation marks.

SYNTAX : COMPARE(string-1,string-2,)

Modifiers:

i ignore-case

ignore-case l removes leading blanks

removes leading blanks n removes quotation marks

removes quotation marks : truncate the longer string to the length of shorter string.

The function returns a 0 is returned when the two strings match (after any modifiers are applied).

A non-zero value is returned if the two strings differ. If string-1 comes before string-2, a negative value is returned else the positive value is returned.

Note that the order of the modifiers is important. As in the below example.

The function returns a 0 when the two strings match (after any modifiers are applied).

For the below examples let string1 = “AbC”, string2 = “ABC”, string3 = ” ‘ABC’n”, string4 =”ABCXYZ”

Function Results COMPARE(string1,string4) 2 (“B” comes before “b”) COMPARE(string4,string1) -2 COMPARE(string1,string2,’i’) 1 COMPARE(string1,string4,’:I’) 0 COMPARE(string1,string3,’nl’) 4 COMPARE(string1,string3,’ln’) 1

CALL COMPCOST, COMPGED, and COMPLEV

COMPGED and COMPLEV are both used to determine the similarity between two strings. The COMPCOST call routine allows you to customize the scoring system when you are using the COMPGED function.

COMPGED computes a quantity called generalized edit distance. This is useful in matching names that are not spelt exactly the same. For example JohnDoe@abc.com and John_Doe@abc.com.

Larger values indicate there are more dissimilarities between the two strings.

COMPLEV performs a similar function but uses a method called the Levenshtein edit distance. The Levenshtein edit distance measures the number of operations required for a single character to transform string-1 into string-2. Operations can be deletion, insertion or replacement.

CALL COMPCOST

To determine the similarity between two strings, using a method called the generalized edit distance.

The cost is computed based on the difference between the two strings.

Since there is a default cost associated with every operation used by COMPGED, you can use that function without using COMPCOST at all. You need to call this function only once in a data step.

For a complete list of operations and cost, you can refer to the SAS Documentation.

SYNTAX : CALL COMPCOST('operation-1', cost-1 <,'operation- 2', cost-2 …>)

Operation is a keyword, placed in quotation marks. A few pf the keywords are DELETE, REPLACE, SWAP, TRUNCATE. Cost is a value associated with the operation. Cost values ranges from -32, 767 to +32,767

Example:

CALL COMPCOST('REPLACE=', 100, 'SWAP=', 200); CALL COMPCOST('SWAP=', 150);

COMPGED

This function is used to compute the similarity between two strings, using a method called the generalized edit distance.

This function can be used in conjunction with CALL COMPCOST if you want to alter the default costs for each type of spelling error.

SYNTAX : COMPGED(string-1, string-2 <,maxcost><,'modifiers'>)

Maxcost, is the maximum cost that will be returned by the COMPLEV function. If the cost computation results in a value larger than maxcost, the value of maxcost will be returned.

Modifiers:

i ignore-case

ignore-case l remove leading blanks

remove leading blanks n removes quotations marks from any argument and ignore-case

removes quotations marks from any argument and ignore-case : Colon modifier truncates the longer string to the length of the shorter string.

Note: If multiple modifiers are used, the order of the modifiers is important and they are applied in the same order as they appear.

String1 String2 Function Results SAME SAME COMPGED(STRING1, STRING2) 0 case CASE COMPGED(STRING1, STRING2) 500 case CASE COMPGED(STRING1, STRING2,’I’) 0 case CASE COMPGED(STRING1, STRING2,999,’I’) 0 Ron Run COMPGED(STRING1, STRING2) 100

COMPLEV

This function uses the Levenshtein edit distance method to compute the similarity between two strings.

COMPLEV is similar to the COMPGED function. However, It uses less compute resources than COMPGED function but may not do an as good job of matching misspelt names.

SYNTAX : COMPLEV(string-1, string-2 <,maxcost> <,'modifiers'>)

String1 String2 Function Results SAME SAME COMPLEV(STRING1, STRING2) 0 case CASE COMPLEV(STRING1, STRING2) 4 case CASE COMPLEV(STRING1, STRING2,’I’) 0 case CASE COMPLEV(STRING1, STRING2,999,’I’) 0 Ron Run COMPLEV(STRING1, STRING2) 1

SOUNDEX

The SOUNDEX function creates a phonetic equivalent of a text string to facilitate “fuzzy” matching. This function is often used to attempt match names where there might be some minor spelling differences.

SYNTAX : SOUNDEX(character-value)

SPEDIS

The SPEDIS functions compute a “spelling distance” between two words.

The spelling distance is 0 if two words are identical.

SPEDIS assigns penalty points for each type of spelling error. For example, if the first of the two words are completely different, there is a large penalty whereas if the letters are not in order there is a smaller penalty.

The final spelling distance is also based on the length of words being matched. For a long word, a wrong letter results in a smaller score than a wrong letter in a shorter word.

SYNTAX : SPEDIS(word-1,word-2)

For these examples WORD1=”Steven” WORD2 = “Stephen” and WORD3 =

“STEVEN”

Function Returns SPEDIS(WORD1, WORD2) 25 SPEDIS(WORD2, WORD1) 28 SPEDIS(WORD1, WORD3) 83 SPEDIS(WORD1, “Steven”) 0

SAS Character Functions to Split Strings into “Words”

For dividing a string into words SAS has two powerful functions. Words can be characters separated by blanks or other delimiters that you specify.

The two functions – SCAN and SCANQ is used to split strings into words. Both functions are similar, however, SCANQ function has some additional features and there is a difference of the default delimiter used in these two functions.

SCAN

SCAN function extracts a specified word from a character expression, where the word is defined as the characters separated by a set of specified delimiters. The length of the returned variables is 200 unless previously defined.

SYNTAX : SCAN(character-value, n-word <,'delimiter-list'>)

The n-word is the nth “word” in the string.

A ‘n’ value greater than the number of words, returns a value that contains no characters. For negative ‘n‘ values, the character value is scanned from right to left. A value of zero is invalid.

SCANQ

SCANQ is also used to extract a specified word from a character expression. The difference between being that the default set of delimiters and a value of 0 for the word count does not result in an error message.SCANQ also ignores delimiters enclosed in quotation marks.

SYNTAX : SCANQ(character-value, n-word <,'delimiter-list'>)

Delimiter-list is an optional argument. If it is omitted, the default delimiter list is white space characters, tab, carriage return, line feed, and form feed).

Examples:

Functions Functions SCAN(“ABC DEF”,2) DEF SCAN(“ABC DEF” HIJ,-1) HIJ SCAN(“ABC DEF”,0) An error in the log SCANQ(“ABC DEF”,0) No characters

SAS Character Functions to Substitute Letters or Words in Strings

TRANSLATE can substitute one character from another in a string. TRNWRD can substitute a word or several words with one or more words.

TRANSLATE

Translate is used to replace one character value with another.

SYNTAX : TRANSLATE(character-value, replace-with,find-string)

Example:

Function Result TRANSLATE(“12X45”, “ABCDE”, “12345”) ABXDE TRANSLATE(“Y”, “10”, “YN”) 1

In the above example, 12345 are the characters to be translated and ABCDE is the characters to be replaced with. So, 1 is replaced with A,2 with B, 3 is not found in the character value so, X is not replaced.

TRANWRD

TRANWRD is used to replace one or more words with a replacement word.

SYNTAX : TRANWRD(character-value, find-string,replace-with)

Example:

Function Results TRANWRD(“Now is the time”,”is”,”is not”) Now is not the time

TRANSTRN

The TRANSTRN function removes all occurrences of a substring in a given character string.

SYNTAX : TRANSTRN( source-expression, target-expression, replacement-expression)

Example:

data _null_; string1='*' || transtrn('abcxabc', 'abc', trimn(' ')) || '*'; str="Hello World"; str2='*' || transtrn(str, 'Hello', trimn(' ')) || '*'; put str2; str3='*' || tranwrd(str, 'Hello', trimn(' ')) || '*'; put str3; run;

In, the above example, TRANSTRN removed the first argument “Hello” whereas TRANWARD added a blank character to the string.

Output:

* World* * World*

SAS Character Functions that compute the length of strings

There are four character functions that can compute the length of character values.

LENGTH function returns the length of a given string. It does not count trailing blank in its calculation.

LENGTHN functions return a value of 0 if there is a null string whereas LENGTH function returns 1.

The LENGTHC functions count trailing blanks in its computation.

LENGTHM is used to compute the length used to store this variable in memory.

SAS Character Functions to count the number of letters or substrings in a string

The COUNT function counts the number of times a given string appears in a string while the COUNTC function counts the number of times specific characters occurs in a string.

COUNT

This character function number of times a given substring appears in a string. The case can be ignored by using modifiers. If no occurrences of the substring

are found, the function returns a 0.

SYNTAX : COUNT(character-value, find-string <,'modifiers'>)

Modifiers:

i : Ignore-case

: Ignore-case t : Ignores trailing blanks in both character value and the find string.

COUNTC

COUNTC is used to count the number of individual characters that appear or do not appear in a string. With the use of a modifier, the case can be ignored.

There is another modifier in the COUNTC function which allows you to count characters that do not appear in the string. It returns 0 if specified characters are found.

SYNTAX : COUNTC(character-value, characters <,'modifiers'>)

Modifiers:

i : Ignore-case

: Ignore-case t : Ignores trailing blanks in both character value and the find string.

: Ignores trailing blanks in both character value and the find string. o : This modifier process the characters or modifiers only once.

: This modifier process the characters or modifiers only once. v: This modifier is used to count only the characters that do not appear in the character value.

Functions Results COUNT(“ding and dong”,”g “) 1 COUNTC(“ABCDEabcde”,”BCD”,’VI’) 4 (A, E, a, and e)

Other useful Functions

QUOTE and DEQUOTE

The QUOTE function inserts double quotation marks to a string. If double quotation marks are already found within the argument, they are doubled in the output.

Example:

str1="hello world"; str2=quote(str1); str3=dequote(str2); put str1 str2 str3

OUTPUT:

hello world "hello world" hello world

The DEQUOTE functions remove quotes from a string. The DEQUOTE, COMPRESS, and TRANSTRN functions can each be used to remove quotes.

DEQUOTE not only removes pairs of quotes, but it will also truncate the remainder of the string. COMPRESS and TRANSTRN can replace all occurrences without looking for quote pairs.

When you apply the DEQUOTE function, the following things happen,

If there are no quotation marks, there is no change in the output.

No leading quotes there is no change in the output.

Matching double and single quotation marks are removed from the resultant string.

Paired double and single quotation marks are reduced to one.

Single or double quotation marks inside single/double quotation marks are unchanged.

If there are unmatched double quotes, the quotes are removed.

Unmatched single quotes remain unchanged

Any text which is placed after the matching quotes mark is deleted.

Example:

string1 = "'CA', ""OR"", 'WA'"; dq1 = dequote(string1);

string1 = "'CA', ""OR"", 'WA'";

dq1 = dequote(string1);

dq1=CA

In the above example, DEQUOTE removes the quotes from CA and truncates the remainder of the string.

WHICHC

The WHICHC function is used to search through a given list of arguments and return the index of the first one that matches the given list.

The count starts with the second argument. If the second argument matches the first argument it returns 1 if it matches with the third argument it returns 3. If no match is found in the list, it returns 0. If multiple matches are found, it returns the index of 1st matched argument.

SYNTAX : WHICHC(string, value-1 <, value-2, …>)

Example:

Fruits=whichc('apple','orange','pear','apple','apple','fig'); put Fruit=;

Fruit=3

CHOOSEC

The CHOOSEC function returns the value of the index provided in the first argument. If a negative index is given, the function counts backwards in the list and return that value.

SYNTAX : CHOOSEC (index-expression, selection-1 <,…selection-n>)

Example:

Fruit=choosec(2,'apple','orange','pear','fig'); put Fruit=;

Fruit=orange

MISSING

This function is used to determine if a charter or numeric argument is missing. The function return1 if the value is missing else it returns 0.

COALESCEC

The COALESCEC accepts one or more character arguments and then returns the first non-missing value from a list of character arguments. This function may be useful if you have a long list of values but may require to find the first non-missing value.

SYNTAX : COALESCEC(argument-1 < …, argument-n>)

Example:

y = COALESCEC ('','Hello', 'World'); put y;

y=Hello

RANK

The RANK function is used to obtain the relative position(ASCII) of the characters.

This can be useful if you want to associate each character with a number so that

an ARRAY subscript can point to a specific character.

Example:

RANK("A") returns 65 and RANK("a") returns 97

REPEAT

The REPEAT function is used to make multiple copies of a string.

SYNTAX : REPEAT(character-value, n)

n is the number of repetitions. The result of this function s the original string plus n repetitions.

Example:

REPEAT("HELLO ",3) returns HELLO HELLO HELLO HELLO

SAS REVERSE Function

You can SAS REVERSE function can be used for advanced string manipulation SAS. The REVERSE function is used to reverse the order of text of a character value.The output variable length is given the length of its first argument if the input variable length is not defined in a DATA step.

Note: Trailing blanks in the argument become leading blanks in the result.

SYNTAX : REVERSE(character-value)

Example: