Software zum Installieren eines Smart-Mirror Frameworks , zum Nutzen von hochschulrelevanten Informationen, auf einem Raspberry-Pi.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 5.1KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111
  1. # Background
  2. In JavaScript there is not always a one-to-one relationship between string characters and what a user would call a separate visual "letter". Some symbols are represented by several characters. This can cause issues when splitting strings and inadvertently cutting a multi-char letter in half, or when you need the actual number of letters in a string.
  3. For example, emoji characters like "🌷","🎁","💩","😜" and "👍" are represented by two JavaScript characters each (high surrogate and low surrogate). That is,
  4. ```javascript
  5. "🌷".length == 2
  6. ```
  7. The combined emoji are even longer:
  8. ```javascript
  9. "🏳️‍🌈".length == 6
  10. ```
  11. What's more, some languages often include combining marks - characters that are used to modify the letters before them. Common examples are the German letter ü and the Spanish letter ñ. Sometimes they can be represented alternatively both as a single character and as a letter + combining mark, with both forms equally valid:
  12. ```javascript
  13. var two = "ñ"; // unnormalized two-char n+◌̃ , i.e. "\u006E\u0303";
  14. var one = "ñ"; // normalized single-char, i.e. "\u00F1"
  15. console.log(one!=two); // prints 'true'
  16. ```
  17. Unicode normalization, as performed by the popular punycode.js library or ECMAScript 6's String.normalize, can **sometimes** fix those differences and turn two-char sequences into single characters. But it is **not** enough in all cases. Some languages like Hindi make extensive use of combining marks on their letters, that have no dedicated single-codepoint Unicode sequences, due to the sheer number of possible combinations.
  18. For example, the Hindi word "अनुच्छेद" is comprised of 5 letters and 3 combining marks:
  19. अ + न + ु + च + ् + छ + े + द
  20. which is in fact just 5 user-perceived letters:
  21. अ + नु + च् + छे + द
  22. and which Unicode normalization would not combine properly.
  23. There are also the unusual letter+combining mark combinations which have no dedicated Unicode codepoint. The string Z͑ͫ̓ͪ̂ͫ̽͏̴̙̤̞͉͚̯̞̠͍A̴̵̜̰͔ͫ͗͢L̠ͨͧͩ͘G̴̻͈͍͔̹̑͗̎̅͛́Ǫ̵̹̻̝̳͂̌̌͘ obviously has 5 separate letters, but is in fact comprised of 58 JavaScript characters, most of which are combining marks.
  24. Enter the grapheme-splitter.js library. It can be used to properly split JavaScript strings into what a human user would call separate letters (or "extended grapheme clusters" in Unicode terminology), no matter what their internal representation is. It is an implementation on the [Default Grapheme Cluster Boundary](http://unicode.org/reports/tr29/#Default_Grapheme_Cluster_Table) of [UAX #29](http://www.unicode.org/reports/tr29/).
  25. # Installation
  26. You can use the index.js file directly as-is. Or you you can install `grapheme-splitter` to your project using the NPM command below:
  27. ```
  28. $ npm install --save grapheme-splitter
  29. ```
  30. # Tests
  31. To run the tests on `grapheme-splitter`, use the command below:
  32. ```
  33. $ npm test
  34. ```
  35. # Usage
  36. Just initialize and use:
  37. ```javascript
  38. var splitter = new GraphemeSplitter();
  39. // split the string to an array of grapheme clusters (one string each)
  40. var graphemes = splitter.splitGraphemes(string);
  41. // iterate the string to an iterable iterator of grapheme clusters (one string each)
  42. var graphemes = splitter.iterateGraphemes(string);
  43. // or do this if you just need their number
  44. var graphemeCount = splitter.countGraphemes(string);
  45. ```
  46. # Examples
  47. ```javascript
  48. var splitter = new GraphemeSplitter();
  49. // plain latin alphabet - nothing spectacular
  50. splitter.splitGraphemes("abcd"); // returns ["a", "b", "c", "d"]
  51. // two-char emojis and six-char combined emoji
  52. splitter.splitGraphemes("🌷🎁💩😜👍🏳️‍🌈"); // returns ["🌷","🎁","💩","😜","👍","🏳️‍🌈"]
  53. // diacritics as combining marks, 10 JavaScript chars
  54. splitter.splitGraphemes("Ĺo͂řȩm̅"); // returns ["Ĺ","o͂","ř","ȩ","m̅"]
  55. // individual Korean characters (Jamo), 4 JavaScript chars
  56. splitter.splitGraphemes("뎌쉐"); // returns ["뎌","쉐"]
  57. // Hindi text with combining marks, 8 JavaScript chars
  58. splitter.splitGraphemes("अनुच्छेद"); // returns ["अ","नु","च्","छे","द"]
  59. // demonic multiple combining marks, 75 JavaScript chars
  60. splitter.splitGraphemes("Z͑ͫ̓ͪ̂ͫ̽͏̴̙̤̞͉͚̯̞̠͍A̴̵̜̰͔ͫ͗͢L̠ͨͧͩ͘G̴̻͈͍͔̹̑͗̎̅͛́Ǫ̵̹̻̝̳͂̌̌͘!͖̬̰̙̗̿̋ͥͥ̂ͣ̐́́͜͞"); // returns ["Z͑ͫ̓ͪ̂ͫ̽͏̴̙̤̞͉͚̯̞̠͍","A̴̵̜̰͔ͫ͗͢","L̠ͨͧͩ͘","G̴̻͈͍͔̹̑͗̎̅͛́","Ǫ̵̹̻̝̳͂̌̌͘","!͖̬̰̙̗̿̋ͥͥ̂ͣ̐́́͜͞"]
  61. ```
  62. # TypeScript
  63. Grapheme splitter includes TypeScript declarations.
  64. ```typescript
  65. import GraphemeSplitter = require('grapheme-splitter')
  66. const splitter = new GraphemeSplitter()
  67. const split: string[] = splitter.splitGraphemes('Z͑ͫ̓ͪ̂ͫ̽͏̴̙̤̞͉͚̯̞̠͍A̴̵̜̰͔ͫ͗͢L̠ͨͧͩ͘G̴̻͈͍͔̹̑͗̎̅͛́Ǫ̵̹̻̝̳͂̌̌͘!͖̬̰̙̗̿̋ͥͥ̂ͣ̐́́͜͞')
  68. ```
  69. # Acknowledgements
  70. This library is heavily influenced by Devon Govett's excellent grapheme-breaker CoffeeScript library at https://github.com/devongovett/grapheme-breaker with an emphasis on ease of integration and pure JavaScript implementation.