Page Contents
Scala Regular Expressions
Regular expressions are strings which can be used to find patterns (or lack thereof) in data. Any string can be converted to a regular expression using the .r
method. Scala support regular expressions through Regex class which is present in scala.util.matching package.
In the below mentioned example, lets try to find a word Proedu in a statement.
Example using r() function
import scala.util.matching.Regex object RegexDemo{ def main(args: Array[String]) { val pattern = "Proedu".r val str = "Proedu is a online taining provider" println(pattern findFirstIn str) } }
Save the above program in RegexDemo.scala. Compile and run the class using below mentioned commands
scalac RegexDemo.scala scala RegexDemo
Output
Some(Proedu)
Explanation: Here, we have called the method r()
on the stated string to obtain an instance of Regex class
, in order to create a pattern. The method findFirstIn()
is utilized in the above code to find the first match of the Regular Expression. In order to find all the matching word in the expression, use findAllIn()
method.
Example using Regex constructor
We can use Regex constructor instead or r() method to create a pattern. We can make use of the mkString( ) method to concatenate the resulting list and you can use a pipe (|) to search small and capital case of Scala.
import scala.util.matching.Regex object RegexDemo{ def main(args: Array[String]) { val pattern = new Regex("(P|p)roedu") val str = "Proedu is a online taining provider. proedu is best" println((pattern findAllIn str).mkString(",")) } }
Save the above program in RegexDemo.scala. Compile and run the class using below mentioned commands
scalac RegexDemo.scala scala RegexDemo
Output
Proedu,proedu
Example – Replacing the first occurrence of matching text
If you would like to replace matching text, we can use replaceFirstIn( ) to replace the first match or replaceAllIn( ) to replace all occurrences.
object RegexDemo{ def main(args: Array[String]) { val p = new Regex("(P|p)roedu") val s = "Proedu is a online training provider. proedu is best" println(p replaceFirstIn(s, "PROEDU")) } }
Save the above program in RegexDemo.scala. Compile and run the class using below mentioned commands.
scalac RegexDemo.scala scala RegexDemo
Output
PROEDU is a online training provider. proedu is best
Forming Regular Expressions
Scala inherits its regular expression syntax from Java, which in turn inherits most of the features of Perl. Here are just some examples that should be enough as refreshers −
Following is the table listing down all the regular expression Meta character syntax available in Java.
Subexpression | Matches |
---|---|
^ | Matches beginning of line. |
$ | Matches end of line. |
. | Matches any single character except newline. Using m option allows it to match newline as well. |
[…] | Matches any single character in brackets. |
[^…] | Matches any single character not in brackets |
\\A | Beginning of entire string |
\\z | End of entire string |
\\Z | End of entire string except allowable final line terminator. |
re* | Matches 0 or more occurrences of preceding expression. |
re+ | Matches 1 or more of the previous thing |
re? | Matches 0 or 1 occurrence of preceding expression. |
re{ n} | Matches exactly n number of occurrences of preceding expression. |
re{ n,} | Matches n or more occurrences of preceding expression. |
re{ n, m} | Matches at least n and at most m occurrences of preceding expression. |
a|b | Matches either a or b. |
(re) | Groups regular expressions and remembers matched text. |
(?: re) | Groups regular expressions without remembering matched text. |
(?> re) | Matches independent pattern without backtracking. |
\\w | Matches word characters. |
\\W | Matches nonword characters. |
\\s | Matches whitespace. Equivalent to [\t\n\r\f]. |
\\S | Matches nonwhitespace. |
\\d | Matches digits. Equivalent to [0-9]. |
\\D | Matches nondigits. |
\\A | Matches beginning of string. |
\\Z | Matches end of string. If a newline exists, it matches just before newline. |
\\z | Matches end of string. |
\\G | Matches point where last match finished. |
\\n | Back-reference to capture group number “n” |
\\b | Matches word boundaries when outside brackets. Matches backspace (0x08) when inside brackets. |
\\B | Matches nonword boundaries. |
\\n, \\t, etc. | Matches newlines, carriage returns, tabs, etc. |
\\Q | Escape (quote) all characters up to \\E |
\\E | Ends quoting begun with \\Q |
Note − that every backslash appears twice in the string above. This is because in Java and Scala a single backslash is an escape character in a string literal, not a regular character that shows up in the string. So instead of ‘\’, you need to write ‘\\’ to get a single backslash in the string.