首页 > C_CPP_CS > [整理]Regular Expressions Libraries

[整理]Regular Expressions Libraries

2009年11月18日 Galaxy 发表评论 阅读评论

目前只小程序,暂时先关心C下的。

TRE基本就是把GNU C library那个给抠出来了,能查找匹配的起止位置。说简单还是要自己写代码取具体字符,感觉用处不大。

PCRE是Perl Compatible Regular Expressions,功能上比较接近Perl的,作为Perl程序员,首选之。

regex嘛,不知道……
看到2008年的alpha就懒得试。难道已经终止开发了?

C++下有个专用的:

boost
Linux下搞C++不用boost,那基本就白活了,大概……
我可没说C++下不能用PCRE。


http://www.dmoz.org/Computers/Programming/Languages/Regular_Expressions/C_and_C%2b%2b/

  • C++ Regular Expression Library – A free component that enables the use of regular expression searching in a C++ program.
  • Grammar to parser classes – C++ template classes for declaring grammars directly in the code as set of compound classes. Includes regexp_parser class for parsing input upon regular expression definition provided in its constructor.
  • Oniguruma – A C regular expression library, developed for the programming language Ruby. Provides software-download, description, links and references.
  • PCRE – Perl Compatible Regular Expressions – A C library for matching regular expressions with Perl 5 syntax and semantics. PCRE has its own native API, as well as a set of wrapper functions that correspond to the POSIX regular expression API.
  • PCRE Win32 – Provides compile PCRE libraries for Windows developers, and source code to build with Visual C++.
  • regex – A modified version of Henry Spencer’s regular expression library (Autoconf, Automake and Libtool scripts have been added and a few file names have been changed). Also related links.
  • TRE – Lightweight, robust, and almost fully POSIX compliant regexp matching library which supports approximate matches. [GNU GPL].
  • xpressive – A C++ regex template library that allows regexes to be written as strings or as expression templates and to refer to themselves and other regexes recursively.

  • Text processing for C/C++ programmers – John Maddock, the author of RegEx++, explains how to use Regular Expressions in C/C++ programs. (October 1, 2001)


http://arglist.com/regex/

Other regular expression libraries


http://billposer.org/Linguistics/Computation/Resources.html#patterns

Regular Expressions and Other Pattern Matching

Tools and Libraries

AT&T Finite State Morphology Library and Lextools
Tools for building, combining, optimizing, and searching weighted finite-state acceptors and transducers.
agrep
An approximate regular expression matcher. This is the older of two approximate regular expression matchers, sometimes referred to as Wu-Manber agrep after its original authors. The source code for the Unix version is available here. Another agrep is provided as part of the TRE regular expression package.
Bison
A parser generator. The input consists of a context-free grammar in a notation similar to BNF together with associated code. This is the GNU implementation of the classic Unix YACC. It is designed to work well with Flex but may be used separately. PyBison is a Python interface to Bison.
CL-PCRE library
A Perl-compatible regular expression library for Common Lisp.
CHSM
A code generator in the tradition of yacc and bison that generates Concurrent Hierarchical State Machines.
The machines are described in a statechart specification language and annotated with code in either C++ or Java. The generated code is fully object oriented, allowing multiple machines to exist concurrently. The CHSM run-time library is small, efficient, and thread-safe.
Daciuk’s Finite State Automaton Utilities
A variety of tools for working with finite state automata and transducers.
Dia2fsm
A tool that takes as input a diagram of a finite state machine in
in dia format and generates C or C++ code implenting it.
Finite State Automata Utilitiies
A collection of utilities for manipulating regular expressions, finite-state automata and finite-state transducers.
Manipulations include automata construction from regular expresssions, determinization, minimization, composition, complementation, intersection, and Kleene closure. Various visualization tools are available for browsing finite-state automata. Interpreters are provided to apply finite automata. Finite automata can also be compiled into stand-alone C programs.
Flex
A lexical-analyzer generator. This is the GNU implementation of the
classic Unix Lex.
Glark
Glark adds to regular-expression matching facilities very similar to those of grep
several special features. It allows Boolean combinations of search predicates and it allows specifications of how far apart (in lines) the matches to different parts of a Boolean must be. It is possible, for instance, to ask for the set of lines containing both A and B no more than K lines apart.
Glark also provides optional color highlighting of matches, allows the user to specify how much context to provide for matches (e.g., “show me the six lines surrounding a match”) and allows for considerable control over multi-file searches and what information they produce (e.g. name of matching file only, name and matching lines, etc.).
Grail
A symbolic computation environment for finite-state machines, regular expressions, and other formal language theory objects.
Groningen Finite State Automaton Utilities
A collection of utilities to manipulate regular expressions, finite-state automata and finite-state transducers.
grep
GNU grep. For another kind of grep try here
HyperLex
A system for performing feature-based regular expression searches on lexical databases.
Kiki
A front end to the Python re module for testing regular expressions
against a sample text that provides extensive output about the results,
including highlighting of groups within a match.
Kodos
A tool for creating, testing and debugging regular expressions for the Python
programming language.
Kregexpeditor
A graphical tool for constructing regular expressions in a fashion somewhat like
a diagram editor. Generates regular expressions in the syntax of either the Qt windowing
toolkit or emacs. This is part of the KDE package and so does not have its own website
for downloading.
Levenshtein
A Python library for computing various measures of string similarity (Levenshtein,
Hamming, Jaro, Jaro-Winkler) and related functions, such as applying edits.
Match
A library callable from C, C++, and Ada that provides a pattern matcher
inspired by that of SNOBOL4.
monq.jfa
A Java class library for finite state automata. Unlike the standard java.util.regex,
which provides only recognizers and substitution, it allows actions to be bound to regular
expressions so that the action is performed whenever the regular expression is matched.
PCRE library
Perl compatible regular expression library.
Pmatch
A regular expression matching tool similar to grep but based on the PCRE
library and with highlighting of matches and display of surrounding lines.
PC-KIMMO
Implementation of Kimmo Koskeniemmi’s Two-Level Morphology
QFSM
A graphical tool for designing finite state machines.
Ragel State Machine Compiler
Ragel compiles finite state machines from regular languages into C, C++, or Objective-C code.
It allows the programmer to embed actions at any point in a regular language.
Redet [Regular Expression Development and Execution Tool]
Redet allows the user to construct regular expressions and test them against input data by executing any of more than 40 search programs, editors, and programming languages
that make use of regular expressions or similar patterns. Redet is written in Tcl, which is therefore always available. Other matchers are executed as child processes
if they are available on the user’s system. When a suitable regular expression has been constructed, it may be saved to a file. For each program, a palette showing the
available regular expression syntax is provided. Selections from the palette may be copied to the regular expression window with a mouse click. Users may add their own definitions to the palette via their initialization file. So long as the underlying program supports Unicode, redet allows UTF-8 Unicode in both test data and
regular expressions. Although the primary function of Redet is to provide
a convenient interface to the actual regular expression tools, it also provides some
extensions of particular interest to linguists. Redet allows you to define
your own named character classes and provides a notation for taking their intersection.
Together, these two capabilities make it possible to perform searches on feature matrices.
re_graph
Given a regular expression draws a diagram of the corresponding finite state automaton.
The Regex Coach
A tool for experimenting with regular expressions. It can single-step through the
matching process as performed by the regex engine and can show a graphical representation of the regular expression’s parse tree. Uses Perl-style regular expressions.
Regex Test
Given a file of sample text, displays the text and allows the user to enter
regular expressions. As the user types, it matches the regular expression against
the sample text and highlights the matching portions.
Regexopt
A program that takes as input a regular expression (in a large subset of
Perl syntax) and produces a more compact equivalent regular expression.
Sed
The standard Unix stream editor. It provides regular expression searches and
substitutions. The GNU sed manual is available at
this site.
The source code may be had here.
There are quite a few versions of sed available, with implementations for a wide
variety of architectures and operating systems. Links to various versions are available
here together
with links to debuggers, tutorials, and other information.
If you find sed too complicated and just want to replace fixed strings,
you might try replace.
Sgrep
A tool for searching and indexing text, SGML, XML and HTML files and filtering text
streams using structural criteria.
Sgrep
A stanza grep tool, which is a more general interface into searching through
IOS configurations (or any file that has a ‘stanza’-like format).
sgrep also can match ip addresses, and even match ip addresses inside a subnet.
Ssed – Super Sed
This is an enhanced version of the standard Unix stream editor sed

It provides extended regular expression syntax and large increases in speed in certain cases.

State Machine Compiler
Given a file containing a description of a finite state machine in a simple
language, generates code for implementing the machine in
C, C++, C#, Java, Perl, Python, Ruby, Tcl, and VB.net.
Stuttgart Finite State Transducer
A toolbox for the implementation of morphological analyzers and other tools based on finite state transducer technology. This is the closest non-proprietary equivalent to the
Xerox Finite State Calculs
Theo
A simulator for finite automata and Turing machines. Written in Java so available for most systems.
TRE regexp library
A library implementing an efficient new algorithm, with C and Python bindings. In addition
to classical syntax it provides some GNU and Perl extensions. It also provides
approximate matching and allows costs to be set in-line, individually for each
group. Wide (UTF-32) and multibyte (UTF-8) characters are supported.
An approximate grep command called agrep using the library is also supplied.
This version of agrep is largely compatible with the
older Wu-Manber agrep at the command-line level but
is more powerful in some respects.
Txt2regex
Txt2regex is a regular expression wizard that converts human sentences to regexes.
In a simple interactive console interface, the user answers questions and the
program constructs the regular expression. Over 20 programs are supported.
Xerox Finite State Calculus
The lexc lexicon compiler and xfst ruler compiler. These compile into finite state automata.
XFA
A C library for creating non-deterministic finite state automata, either programmatically
or from regular expressions and for converting them to the minimal equivalent deterministic
finite state automaton.
Xmlgrep
A command-line utility that matches regular expressions against strings with XML markup.

Tutorials

Miscellaneous

Tags: ,

Related posts

分类: C_CPP_CS 标签: , 404 views
1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...
  1. 本文目前尚无任何评论.
  1. 本文目前尚无任何 trackbacks 和 pingbacks.

Locations of visitors to this page