You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 6.8KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185
  1. # seek-bzip
  2. [![Build Status][1]][2] [![dependency status][3]][4] [![dev dependency status][5]][6]
  3. `seek-bzip` is a pure-javascript Node.JS module adapted from [node-bzip](https://github.com/skeggse/node-bzip) and before that [antimatter15's pure-javascript bzip2 decoder](https://github.com/antimatter15/bzip2.js). Like these projects, `seek-bzip` only does decompression (see [compressjs](https://github.com/cscott/compressjs) if you need compression code). Unlike those other projects, `seek-bzip` can seek to and decode single blocks from the bzip2 file.
  4. `seek-bzip` primarily decodes buffers into other buffers, synchronously.
  5. With the help of the [fibers](https://github.com/laverdet/node-fibers)
  6. package, it can operate on node streams; see `test/stream.js` for an
  7. example.
  8. ## How to Install
  9. ```
  10. npm install seek-bzip
  11. ```
  12. This package uses
  13. [Typed Arrays](https://developer.mozilla.org/en-US/docs/JavaScript/Typed_arrays), which are present in node.js >= 0.5.5.
  14. ## Usage
  15. After compressing some example data into `example.bz2`, the following will recreate that original data and save it to `example`:
  16. ```
  17. var Bunzip = require('seek-bzip');
  18. var fs = require('fs');
  19. var compressedData = fs.readFileSync('example.bz2');
  20. var data = Bunzip.decode(compressedData);
  21. fs.writeFileSync('example', data);
  22. ```
  23. See the tests in the `tests/` directory for further usage examples.
  24. For uncompressing single blocks of bzip2-compressed data, you will need
  25. an out-of-band index listing the start of each bzip2 block. (Presumably
  26. you generate this at the same time as you index the start of the information
  27. you wish to seek to inside the compressed file.) The `seek-bzip` module
  28. has been designed to be compatible with the C implementation `seek-bzip2`
  29. available from https://bitbucket.org/james_taylor/seek-bzip2. That codebase
  30. contains a `bzip-table` tool which will generate bzip2 block start indices.
  31. There is also a pure-JavaScript `seek-bzip-table` tool in this package's
  32. `bin` directory.
  33. ## Documentation
  34. `require('seek-bzip')` returns a `Bunzip` object. It contains three static
  35. methods. The first is a function accepting one or two parameters:
  36. `Bunzip.decode = function(input, [Number expectedSize] or [output], [boolean multistream])`
  37. The `input` argument can be a "stream" object (which must implement the
  38. `readByte` method), or a `Buffer`.
  39. If `expectedSize` is not present, `decodeBzip` simply decodes `input` and
  40. returns the resulting `Buffer`.
  41. If `expectedSize` is present (and numeric), `decodeBzip` will store
  42. the results in a `Buffer` of length `expectedSize`, and throw an error
  43. in the case that the size of the decoded data does not match
  44. `expectedSize`.
  45. If you pass a non-numeric second parameter, it can either be a `Buffer`
  46. object (which must be of the correct length; an error will be thrown if
  47. the size of the decoded data does not match the buffer length) or
  48. a "stream" object (which must implement a `writeByte` method).
  49. The optional third `multistream` parameter, if true, attempts to continue
  50. reading past the end of the bzip2 file. This supports "multistream"
  51. bzip2 files, which are simply multiple bzip2 files concatenated together.
  52. If this argument is true, the input stream must have an `eof` method
  53. which returns true when the end of the input has been reached.
  54. The second exported method is a function accepting two or three parameters:
  55. `Bunzip.decodeBlock = function(input, Number blockStartBits, [Number expectedSize] or [output])`
  56. The `input` and `expectedSize`/`output` parameters are as above.
  57. The `blockStartBits` parameter gives the start of the desired block, in bits.
  58. If passing a stream as the `input` parameter, it must implement the
  59. `seek` method.
  60. The final exported method is a function accepting two or three parameters:
  61. `Bunzip.table = function(input, Function callback, [boolean multistream])`
  62. The `input` and `multistream` parameters are identical to those for the
  63. `decode` method.
  64. This function will invoke `callback(position, size)` once per bzip2 block,
  65. where `position` gives the starting position of the block (in *bits*), and
  66. `size` gives the uncompressed size of the block (in bytes).
  67. This can be used to construct an index allowing direct access to a particular
  68. block inside a bzip2 file, using the `decodeBlock` method.
  69. ## Command-line
  70. There are binaries available in bin. The first generates an index of all
  71. the blocks in a bzip2-compressed file:
  72. ```
  73. $ bin/seek-bzip-table test/sample4.bz2
  74. 32 99981
  75. 320555 99981
  76. 606348 99981
  77. 847568 99981
  78. 1089094 99981
  79. 1343625 99981
  80. 1596228 99981
  81. 1843336 99981
  82. 2090919 99981
  83. 2342106 39019
  84. $
  85. ```
  86. The first field is the starting position of the block, in bits, and the
  87. second field is the length of the block, in bytes.
  88. The second binary decodes an arbitrary block of a bzip2 file:
  89. ```
  90. $ bin/seek-bunzip -d -b 2342106 test/sample4.bz2 | tail
  91. élan's
  92. émigré
  93. émigré's
  94. émigrés
  95. épée
  96. épée's
  97. épées
  98. étude
  99. étude's
  100. études
  101. $
  102. ```
  103. Use `--help` to see other options.
  104. ## Help wanted
  105. Improvements to this module would be generally useful.
  106. Feel free to fork on github and submit pull requests!
  107. ## Related projects
  108. * https://github.com/skeggse/node-bzip node-bzip (original upstream source)
  109. * https://github.com/cscott/compressjs
  110. Lots of compression/decompression algorithms from the same author as this
  111. module, including bzip2 compression code.
  112. * https://github.com/cscott/lzjb fast LZJB compression/decompression
  113. ## License
  114. #### MIT License
  115. > Copyright © 2013-2015 C. Scott Ananian
  116. >
  117. > Copyright © 2012-2015 Eli Skeggs
  118. >
  119. > Copyright © 2011 Kevin Kwok
  120. >
  121. > Permission is hereby granted, free of charge, to any person obtaining
  122. > a copy of this software and associated documentation files (the
  123. > "Software"), to deal in the Software without restriction, including
  124. > without limitation the rights to use, copy, modify, merge, publish,
  125. > distribute, sublicense, and/or sell copies of the Software, and to
  126. > permit persons to whom the Software is furnished to do so, subject to
  127. > the following conditions:
  128. >
  129. > The above copyright notice and this permission notice shall be
  130. > included in all copies or substantial portions of the Software.
  131. >
  132. > THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
  133. > EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
  134. > MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
  135. > NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
  136. > LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
  137. > OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
  138. > WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
  139. [1]: https://travis-ci.org/cscott/seek-bzip.png
  140. [2]: https://travis-ci.org/cscott/seek-bzip
  141. [3]: https://david-dm.org/cscott/seek-bzip.png
  142. [4]: https://david-dm.org/cscott/seek-bzip
  143. [5]: https://david-dm.org/cscott/seek-bzip/dev-status.png
  144. [6]: https://david-dm.org/cscott/seek-bzip#info=devDependencies