Ohm-Management - Projektarbeit B-ME
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 1.8KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081
  1. chardet [![Build Status](https://travis-ci.org/runk/node-chardet.png)](https://travis-ci.org/runk/node-chardet)
  2. =====
  3. Chardet is a character detection module for NodeJS written in pure Javascript.
  4. Module is based on ICU project http://site.icu-project.org/, which uses character
  5. occurency analysis to determine the most probable encoding.
  6. ## Installation
  7. ```
  8. npm i chardet
  9. ```
  10. ## Usage
  11. To return the encoding with the highest confidence:
  12. ```javascript
  13. var chardet = require('chardet');
  14. chardet.detect(Buffer.alloc('hello there!'));
  15. // or
  16. chardet.detectFile('/path/to/file', function(err, encoding) {});
  17. // or
  18. chardet.detectFileSync('/path/to/file');
  19. ```
  20. To return the full list of possible encodings:
  21. ```javascript
  22. var chardet = require('chardet');
  23. chardet.detectAll(Buffer.alloc('hello there!'));
  24. // or
  25. chardet.detectFileAll('/path/to/file', function(err, encoding) {});
  26. // or
  27. chardet.detectFileAllSync('/path/to/file');
  28. //Returned value is an array of objects sorted by confidence value in decending order
  29. //e.g. [{ confidence: 90, name: 'UTF-8'}, {confidence: 20, name: 'windows-1252', lang: 'fr'}]
  30. ```
  31. ## Working with large data sets
  32. Sometimes, when data set is huge and you want to optimize performace (in tradeoff of less accuracy),
  33. you can sample only first N bytes of the buffer:
  34. ```javascript
  35. chardet.detectFile('/path/to/file', { sampleSize: 32 }, function(err, encoding) {});
  36. ```
  37. ## Supported Encodings:
  38. * UTF-8
  39. * UTF-16 LE
  40. * UTF-16 BE
  41. * UTF-32 LE
  42. * UTF-32 BE
  43. * ISO-2022-JP
  44. * ISO-2022-KR
  45. * ISO-2022-CN
  46. * Shift-JIS
  47. * Big5
  48. * EUC-JP
  49. * EUC-KR
  50. * GB18030
  51. * ISO-8859-1
  52. * ISO-8859-2
  53. * ISO-8859-5
  54. * ISO-8859-6
  55. * ISO-8859-7
  56. * ISO-8859-8
  57. * ISO-8859-9
  58. * windows-1250
  59. * windows-1251
  60. * windows-1252
  61. * windows-1253
  62. * windows-1254
  63. * windows-1255
  64. * windows-1256
  65. * KOI8-R
  66. Currently only these encodings are supported, more will be added soon.