You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 10KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284
  1. [XRegExp](http://xregexp.com/)
  2. ==============================
  3. XRegExp provides augmented, extensible, cross-browser JavaScript regular expressions. You get new syntax and flags beyond what browsers support natively, along with a collection of utils to make your client-side grepping and parsing easier. XRegExp also frees you from worrying about pesky inconsistencies in cross-browser regex handling and the dubious `lastIndex` property.
  4. XRegExp supports all native ES5 regular expression syntax. It's about 3.5 KB when minified and gzipped. It works with Internet Explorer 5.5+, Firefox 1.5+, Chrome, Safari 3+, and Opera 9.5+.
  5. ## Performance
  6. XRegExp regular expressions compile to native RegExp objects, thus there is no performance difference when using XRegExp objects with native methods. There is a small performance cost when *compiling* XRegExps. If you want, however, you can use `XRegExp.cache` to avoid ever incurring the compilation cost for a given pattern more than once. Doing so can even lead to XRegExp being faster than native regexes in synthetic tests that repeatedly compile the same regex.
  7. ## Usage examples
  8. ~~~ js
  9. // Using named capture and flag x (free-spacing and line comments)
  10. var date = XRegExp('(?<year> [0-9]{4}) -? # year \n\
  11. (?<month> [0-9]{2}) -? # month \n\
  12. (?<day> [0-9]{2}) # day ', 'x');
  13. // XRegExp.exec gives you named backreferences on the match result
  14. var match = XRegExp.exec('2012-02-22', date);
  15. match.day; // -> '22'
  16. // It also includes optional pos and sticky arguments
  17. var pos = 3, result = [];
  18. while (match = XRegExp.exec('<1><2><3><4>5<6>', /<(\d+)>/, pos, 'sticky')) {
  19. result.push(match[1]);
  20. pos = match.index + match[0].length;
  21. } // result -> ['2', '3', '4']
  22. // XRegExp.replace allows named backreferences in replacements
  23. XRegExp.replace('2012-02-22', date, '${month}/${day}/${year}'); // -> '02/22/2012'
  24. XRegExp.replace('2012-02-22', date, function (match) {
  25. return match.month + '/' + match.day + '/' + match.year;
  26. }); // -> '02/22/2012'
  27. // In fact, all XRegExps are RegExps and work perfectly with native methods
  28. date.test('2012-02-22'); // -> true
  29. // The *only* caveat is that named captures must be referred to using numbered backreferences
  30. '2012-02-22'.replace(date, '$2/$3/$1'); // -> '02/22/2012'
  31. // If you want, you can extend native methods so you don't have to worry about this
  32. // Doing so also fixes numerous browser bugs in the native methods
  33. XRegExp.install('natives');
  34. '2012-02-22'.replace(date, '${month}/${day}/${year}'); // -> '02/22/2012'
  35. '2012-02-22'.replace(date, function (match) {
  36. return match.month + '/' + match.day + '/' + match.year;
  37. }); // -> '02/22/2012'
  38. date.exec('2012-02-22').day; // -> '22'
  39. // Extract every other digit from a string using XRegExp.forEach
  40. XRegExp.forEach('1a2345', /\d/, function (match, i) {
  41. if (i % 2) this.push(+match[0]);
  42. }, []); // -> [2, 4]
  43. // Get numbers within <b> tags using XRegExp.matchChain
  44. XRegExp.matchChain('1 <b>2</b> 3 <b>4 a 56</b>', [
  45. XRegExp('(?is)<b>.*?</b>'),
  46. /\d+/
  47. ]); // -> ['2', '4', '56']
  48. // You can also pass forward and return specific backreferences
  49. var html = '<a href="http://xregexp.com/">XRegExp</a>\
  50. <a href="http://www.google.com/">Google</a>';
  51. XRegExp.matchChain(html, [
  52. {regex: /<a href="([^"]+)">/i, backref: 1},
  53. {regex: XRegExp('(?i)^https?://(?<domain>[^/?#]+)'), backref: 'domain'}
  54. ]); // -> ['xregexp.com', 'www.google.com']
  55. // XRegExp.union safely merges strings and regexes into a single pattern
  56. XRegExp.union(['a+b*c', /(dogs)\1/, /(cats)\1/], 'i');
  57. // -> /a\+b\*c|(dogs)\1|(cats)\2/i
  58. ~~~
  59. These examples should give you the flavor of what's possible, but XRegExp has more syntax, flags, utils, options, and browser fixes that aren't shown here. You can even augment XRegExp's regular expression syntax with addons (see below) or write your own. See [xregexp.com](http://xregexp.com/) for more details.
  60. ## Addons
  61. In browsers, you can either load addons individually, or bundle all addons together with XRegExp by loading `xregexp-all.js`. XRegExp's [npm](http://npmjs.org/) package uses `xregexp-all.js`, which means that the addons are always available when XRegExp is installed on the server using npm.
  62. ### XRegExp Unicode Base
  63. In browsers, first include the Unicode Base script:
  64. ~~~ html
  65. <script src="xregexp.js"></script>
  66. <script src="addons/unicode/unicode-base.js"></script>
  67. ~~~
  68. Then you can do this:
  69. ~~~ js
  70. var unicodeWord = XRegExp('^\\p{L}+$');
  71. unicodeWord.test('Русский'); // -> true
  72. unicodeWord.test('日本語'); // -> true
  73. unicodeWord.test('العربية'); // -> true
  74. ~~~
  75. The base script adds `\p{Letter}` and its alias `\p{L}`, but other Unicode categories, scripts, blocks, and properties require addon packages. Try these next examples after additionally including `unicode-scripts.js`:
  76. ~~~ js
  77. XRegExp('^\\p{Hiragana}+$').test('ひらがな'); // -> true
  78. XRegExp('^[\\p{Latin}\\p{Common}]+$').test('Über Café.'); // -> true
  79. ~~~
  80. XRegExp uses the Unicode 6.1 Basic Multilingual Plane.
  81. ### XRegExp.build
  82. In browsers, first include the script:
  83. ~~~ html
  84. <script src="xregexp.js"></script>
  85. <script src="addons/build.js"></script>
  86. ~~~
  87. You can then build regular expressions using named subpatterns, for readability and pattern reuse:
  88. ~~~ js
  89. var time = XRegExp.build('(?x)^ {{hours}} ({{minutes}}) $', {
  90. hours: XRegExp.build('{{h12}} : | {{h24}}', {
  91. h12: /1[0-2]|0?[1-9]/,
  92. h24: /2[0-3]|[01][0-9]/
  93. }, 'x'),
  94. minutes: /^[0-5][0-9]$/
  95. });
  96. time.test('10:59'); // -> true
  97. XRegExp.exec('10:59', time).minutes; // -> '59'
  98. ~~~
  99. Named subpatterns can be provided as strings or regex objects. A leading `^` and trailing unescaped `$` are stripped from subpatterns if both are present, which allows embedding independently useful anchored patterns. `{{…}}` tokens can be quantified as a single unit. Backreferences in the outer pattern and provided subpatterns are automatically renumbered to work correctly within the larger combined pattern. The syntax `({{name}})` works as shorthand for named capture via `(?<name>{{name}})`. Named subpatterns cannot be embedded within character classes.
  100. See also: *[Creating Grammatical Regexes Using XRegExp.build](http://blog.stevenlevithan.com/archives/grammatical-patterns-xregexp-build)*.
  101. ### XRegExp.matchRecursive
  102. In browsers, first include the script:
  103. ~~~ html
  104. <script src="xregexp.js"></script>
  105. <script src="addons/matchrecursive.js"></script>
  106. ~~~
  107. You can then match recursive constructs using XRegExp pattern strings as left and right delimiters:
  108. ~~~ js
  109. var str = '(t((e))s)t()(ing)';
  110. XRegExp.matchRecursive(str, '\\(', '\\)', 'g');
  111. // -> ['t((e))s', '', 'ing']
  112. // Extended information mode with valueNames
  113. str = 'Here is <div> <div>an</div></div> example';
  114. XRegExp.matchRecursive(str, '<div\\s*>', '</div>', 'gi', {
  115. valueNames: ['between', 'left', 'match', 'right']
  116. });
  117. /* -> [
  118. {name: 'between', value: 'Here is ', start: 0, end: 8},
  119. {name: 'left', value: '<div>', start: 8, end: 13},
  120. {name: 'match', value: ' <div>an</div>', start: 13, end: 27},
  121. {name: 'right', value: '</div>', start: 27, end: 33},
  122. {name: 'between', value: ' example', start: 33, end: 41}
  123. ] */
  124. // Omitting unneeded parts with null valueNames, and using escapeChar
  125. str = '...{1}\\{{function(x,y){return y+x;}}';
  126. XRegExp.matchRecursive(str, '{', '}', 'g', {
  127. valueNames: ['literal', null, 'value', null],
  128. escapeChar: '\\'
  129. });
  130. /* -> [
  131. {name: 'literal', value: '...', start: 0, end: 3},
  132. {name: 'value', value: '1', start: 4, end: 5},
  133. {name: 'literal', value: '\\{', start: 6, end: 8},
  134. {name: 'value', value: 'function(x,y){return y+x;}', start: 9, end: 35}
  135. ] */
  136. // Sticky mode via flag y
  137. str = '<1><<<2>>><3>4<5>';
  138. XRegExp.matchRecursive(str, '<', '>', 'gy');
  139. // -> ['1', '<<2>>', '3']
  140. ~~~
  141. `XRegExp.matchRecursive` throws an error if it sees an unbalanced delimiter in the target string.
  142. ### XRegExp Prototype Methods
  143. In browsers, first include the script:
  144. ~~~ html
  145. <script src="xregexp.js"></script>
  146. <script src="addons/prototypes.js"></script>
  147. ~~~
  148. New XRegExp regexes then gain a collection of useful methods: `apply`, `call`, `forEach`, `globalize`, `xexec`, and `xtest`.
  149. ~~~ js
  150. // To demonstrate the call method, let's first create the function we'll be using...
  151. function filter(array, fn) {
  152. var res = [];
  153. array.forEach(function (el) {if (fn.call(null, el)) res.push(el);});
  154. return res;
  155. }
  156. // Now we can filter arrays using functions and regexes
  157. filter(['a', 'ba', 'ab', 'b'], XRegExp('^a')); // -> ['a', 'ab']
  158. ~~~
  159. Native `RegExp` objects copied by `XRegExp` are augmented with any `XRegExp.prototype` methods. The following lines therefore work equivalently:
  160. ~~~ js
  161. XRegExp('[a-z]', 'ig').xexec('abc');
  162. XRegExp(/[a-z]/ig).xexec('abc');
  163. XRegExp.globalize(/[a-z]/i).xexec('abc');
  164. ~~~
  165. ## Installation and usage
  166. In browsers:
  167. ~~~ html
  168. <script src="xregexp-min.js"></script>
  169. ~~~
  170. Or, to bundle XRegExp with all of its addons:
  171. ~~~ html
  172. <script src="xregexp-all-min.js"></script>
  173. ~~~
  174. Using [npm](http://npmjs.org/):
  175. ~~~ bash
  176. npm install xregexp
  177. ~~~
  178. In [Node.js](http://nodejs.org/) and [CommonJS module](http://wiki.commonjs.org/wiki/Modules) loaders:
  179. ~~~ js
  180. var XRegExp = require('xregexp').XRegExp;
  181. ~~~
  182. ### Running tests on the server with npm
  183. ~~~ bash
  184. npm install -g qunit # needed to run the tests
  185. npm test # in the xregexp root
  186. ~~~
  187. If XRegExp was not installed using npm, just open `tests/index.html` in your browser.
  188. ## &c
  189. **Lookbehind:** A [collection of short functions](https://gist.github.com/2387872) is available that makes it easy to simulate infinite-length leading lookbehind.
  190. ## Changelog
  191. * Releases: [Version history](http://xregexp.com/history/).
  192. * Upcoming: [Milestones](https://github.com/slevithan/XRegExp/issues/milestones), [Roadmap](https://github.com/slevithan/XRegExp/wiki/Roadmap).
  193. ## About
  194. XRegExp and addons copyright 2007-2012 by [Steven Levithan](http://stevenlevithan.com/).
  195. Tools: Unicode range generators by [Mathias Bynens](http://mathiasbynens.be/). Source file concatenator by [Bjarke Walling](http://twitter.com/walling).
  196. Prior art: `XRegExp.build` inspired by [Lea Verou](http://lea.verou.me/)'s [RegExp.create](http://lea.verou.me/2011/03/create-complex-regexps-more-easily/). `XRegExp.union` inspired by [Ruby](http://www.ruby-lang.org/). XRegExp's syntax extensions come from Perl, .NET, etc.
  197. All code released under the [MIT License](http://mit-license.org/).
  198. Fork me to show support, fix, and extend.