Difference between revisions of "Regular expression"

Revision as of 14:28, 28 March 2020

Summary

Description what this documentation is about

History of Regular Expressions

Regular Expressions, or regexes in short, is a defined sequence of characters that specify a search pattern. These expressions are used by search machines like a search engine for optimized string searching. The concept of regular expressions arose in the 1950 by the mathematician Stephen Cole Kleene. Kleene described regular languages wit the usage of his mathematical notation called regular events. His notation got used for automata theory and the classification and description of formal languages. With the Rise of Computers many more syntaxes for writing regular expressions got developed. The most popular today are the POSIX standard and the Perl Compatible Regular Expressions (PCRE). Today regexes are implemented into many programming languages like PHP, Python and Java. Furthermore, companies started to produce hardware, FPGA and GPU implementation of PCRE to overcome the performance which is possible by CPUs.

Definition of Regular Expressions

Regular expression gets used to create a pattern that defines the regular language. A finite automata can inspect if the inputted word belongs to the regular language which gets defined by the created pattern.

Syntax of Regular Expressions

The regular expression is using the character of the alphabet Σ. Every alphabet consists at least of the Symbols ε and ∅ and of the operators +, · and *.

The amount of regular expressions over the alphabet Σ gets defined recursively as follows:

ε and ∅ are regular expressions
Every character a ∈ ∑ is a regular expression
If α and β are regular expressions, (α β) and (α + β) are also regular expressions
If α is a regular expression, α* is ale a regular expression

Operator priorities:

* ties more than ·, which ties stronger than +

Semantics of Regular Expressions

a) The Language L(α) established by the regular expression α is defined as follows:

L(∅) = ∅ empty set
L(ε) = {ε} empty string
L(a) = {a} for a ∈ ∑ literal character
L(αβ) = L(α)L(β) concatenation
L(α+β) = L(α) ∪ L(β) alternation
L(α)* = L(α)* (Kleene star) arbitrary repetition

b) The two regular expressions α and β are equivalent in character α ≡ β if L(α) = L(β)

Regular expressions and deterministic finite automaton (DFA)

By using a regex processor, the regular expression translated into an internal representation. One possible approach is to use the Thompson's construction algorithm to develop nondeterministic finite automaton (NFA) that gets translated into a deterministic finite automaton (DFA), which only accepts words based on the regular expression.

Example:

 (1 + 01 + 001)∗(ε+ 0 + 00)

Speaks the same language of the following DFA:

Insert image

References

Konzepte der Informatik von Peter Lory gelehrt an der FH-Campus Wien
https://en.wikipedia.org/wiki/Regular_expression
https://en.wikipedia.org/wiki/Perl_Compatible_Regular_Expressions
https://www.w3schools.com/python/python_regex.asp
https://docs.python.org/3/library/re.html

Difference between revisions of "Regular expression"

Revision as of 14:28, 28 March 2020

Contents

Summary

History of Regular Expressions

Definition of Regular Expressions

Syntax of Regular Expressions

Semantics of Regular Expressions

Regular expressions and deterministic finite automaton (DFA)

References

References

References

Navigation menu

@@ Line 41: / Line 41: @@
 Example:
    (1 + 01 + 001)∗(ε+ 0 + 00)
-  Speaks the same language of the following DFA:
+Speaks the same language of the following DFA:
+''''' Insert image '''''
 == References ==

Difference between revisions of "Regular expression"

Revision as of 14:28, 28 March 2020

Summary

History of Regular Expressions

Definition of Regular Expressions

Syntax of Regular Expressions

Semantics of Regular Expressions

Regular expressions and deterministic finite automaton (DFA)

References

References

References

Navigation menu

Search