|
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991001011021031041051061071081091101111121131141151161171181191201211221231241251261271281291301311321331341351361371381391401411421431441451461471481491501511521531541551561571581591601611621631641651661671681691701711721731741751761771781791801811821831841851861871881891901911921931941951961971981992002012022032042052062072082092102112122132142152162172182192202212222232242252262272282292302312322332342352362372382392402412422432442452462472482492502512522532542552562572582592602612622632642652662672682692702712722732742752762772782792802812822832842852862872882892902912922932942952962972982993003013023033043053063073083093103113123133143153163173183193203213223233243253263273283293303313323333343353363373383393403413423433443453463473483493503513523533543553563573583593603613623633643653663673683693703713723733743753763773783793803813823833843853863873883893903913923933943953963973983994004014024034044054064074084094104114124134144154164174184194204214224234244254264274284294304314324334344354364374384394404414424434444454464474484494504514524534544554564574584594604614624634644654664674684694704714724734744754764774784794804814824834844854864874884894904914924934944954964974984995005015025035045055065075085095105115125135145155165175185195205215225235245255265275285295305315325335345355365375385395405415425435445455465475485495505515525535545555565575585595605615625635645655665675685695705715725735745755765775785795805815825835845855865875885895905915925935945955965975985996006016026036046056066076086096106116126136146156166176186196206216226236246256266276286296306316326336346356366376386396406416426436446456466476486496506516526536546556566576586596606616626636646656666676686696706716726736746756766776786796806816826836846856866876886896906916926936946956966976986997007017027037047057067077087097107117127137147157167177187197207217227237247257267277287297307317327337347357367377387397407417427437447457467477487497507517527537547557567577587597607617627637647657667677687697707717727737747757767777787797807817827837847857867877887897907917927937947957967977987998008018028038048058068078088098108118128138148158168178188198208218228238248258268278288298308318328338348358368378388398408418428438448458468478488498508518528538548558568578588598608618628638648658668678688698708718728738748758768778788798808818828838848858868878888898908918928938948958968978988999009019029039049059069079089099109119129139149159169179189199209219229239249259269279289299309319329339349359369379389399409419429439449459469479489499509519529539549559569579589599609619629639649659669679689699709719729739749759769779789799809819829839849859869879889899909919929939949959969979989991000100110021003 |
- # printj
-
- Extended `sprintf` implementation (for the browser and nodejs). Emphasis on
- compliance, performance and IE6+ support.
-
- ```js
- PRINTJ.sprintf("Hello %s!", "World");
- ```
-
- A self-contained specification of the printf format string is included below in [this README](#printf-format-string-specification), as well as a summary of the
- [support against various printf implementations](#support-summary)
-
-
- ## Installation
-
- With [npm](https://www.npmjs.org/package/printj):
-
- ```bash
- $ npm install printj
- ```
-
- In the browser:
-
- ```html
- <script src="printj.js"></script>
- ```
-
- The browser exposes a variable `PRINTJ`
-
- When installed globally, npm installs a script `printj` that renders the format
- string with the given arguments. Running the script with `-h` displays help.
-
- The script will manipulate `module.exports` if available (e.g. in a CommonJS
- `require` context). This is not always desirable. To prevent the behavior,
- define `DO_NOT_EXPORT_PRINTJ`
-
- ## Usage
-
- In all cases, the relevant function takes a format and arguments to be rendered.
-
- The return value is a JS string.
-
- - `PRINTJ.sprintf(format, ...args)` assumes the arguments are passed directly
-
- - `PRINTJ.vsprintf(format, argv)` assumes the arguments are passed in an array
-
- For example:
-
- ```js
- > // var PRINTJ = require('printj'); // uncomment this line if in node
- > var sprintf = PRINTJ.sprintf, vsprintf = PRINTJ.vsprintf;
- > sprintf("Hello %s", "SheetJS") // 'Hello SheetJS'
- > sprintf("%d + %d = %d", 2,3,2+3) // '2 + 3 = 5'
- > vsprintf("%d + %d = %d", [2,3,5]) // '2 + 3 = 5'
- > sprintf("%1$02hhx %1$u %1$i %1$o", -69) // 'bb 4294967227 -69 37777777673'
- ```
-
- The command line script takes a format and arguments:
-
- ```
- usage: printj [options] <format> [args...]
-
- Options:
- -h, --help output usage information
- -d, --dump print debug information about format string
-
- Arguments are treated as strings unless prefaced by a type indicator:
- n:<integer> call parseInt (ex. n:3 -> 3)
- f:<float> call parseFloat (ex. f:3.1 -> 3.1)
- b:<boolean> false when lowercase value is "FALSE" or "0", else true
- s:<string> interpret as string (ex. s:n:3 -> "n:3")
- j:<JSON> interpret as an object using JSON.parse
- e:<JS> evaluate argument (ex. e:1+1 -> 2, e:"1"+1 -> "11")
-
- samples:
- $ printj '|%02hhx%d|' n:50 e:0x7B # |32123|
- $ printj '|%2$d + %3$d is %1$d|' e:1+2 n:1 n:2 # |1 + 2 is 3|
- $ printj '|%s is %s|' s:1+2 e:1+2 # |1+2 is 3|
- $ printj '|%c %c|' s:69 n:69 # |6 E|
- ```
-
- ## Testing
-
- `make test` will run the nodejs-based test.
-
- `make stress` will run a larger test encompassing every possible conversion. It
- requires access to a C compiler.
-
- ## License
-
- Please consult the attached LICENSE file for details. All rights not explicitly
- granted by the Apache 2.0 license are reserved by the Original Author.
-
- ## Badges
-
- [![Build Status](https://saucelabs.com/browser-matrix/printj.svg)](https://saucelabs.com/u/printj)
-
- [![Build Status](https://travis-ci.org/SheetJS/printj.svg?branch=master)](https://travis-ci.org/SheetJS/printj)
-
- [![Coverage Status](http://img.shields.io/coveralls/SheetJS/printj/master.svg)](https://coveralls.io/r/SheetJS/printj?branch=master)
-
- [![NPM Downloads](https://img.shields.io/npm/dt/printj.svg)](https://npmjs.org/package/printj)
-
- [![Dependencies Status](https://david-dm.org/sheetjs/printj/status.svg)](https://david-dm.org/sheetjs/printj)
-
- [![ghit.me](https://ghit.me/badge.svg?repo=sheetjs/printj)](https://ghit.me/repo/sheetjs/printj)
-
- [![Analytics](https://ga-beacon.appspot.com/UA-36810333-1/SheetJS/printj?pixel)](https://github.com/SheetJS/printj)
-
- # printf format string specification
-
- The `printf` family of functions attempt to generate and output a string of
- characters from a series of arguments, following a user-supplied "format string"
- specification. The format string contains normal characters that are written to
- the output string as well as specifiers that describe which parameter to insert
- and how to render the parameter. This specification describes how a conformant
- implementation should process the format string and generate an output string.
- Any discrepancies between this document and the reference implementation are
- considered bugs in the implementation.
-
- ### Original C Interface
-
- Every function in the `printf` family follows the same logic to generate strings
- but have different interfaces reflecting different input and output behaviors.
- Some functions have wide variants that use wide `wchar_t *` strings rather than
- normal C `char *`. The following variants are required by the POSIX spec:
-
- | function | max length | output destination | vintage | wide ver |
- |------------|------------|-----------------------|---------|------------|
- | `printf` | unbounded | standard output | K&R | `wprintf` |
- | `fprintf` | unbounded | stream (`FILE *`) | K&R | `fwprintf` |
- | `sprintf` | unbounded | string (`char *`) | K&R | `swprintf` |
- | `snprintf` | parameter | string (`char *`) | C99 | |
- | `dprintf` | unbounded | POSIX file descriptor | POSIX | |
-
- Each function has a dual function, whose name begins with `v`, that accepts the
- parameters as a `va_list` rather than formal parameters. In all cases, they
- return the number of characters written or a negative value to indicate error:
-
- ```C
- int sprintf(char *ostr, const char *fmt, ...);
- int vsprintf(char *ostr, const char *fmt, va_list arg_list);
-
- int swprintf(wchar_t *ostr, const wchar_t *fmt, ...);
- int vswprintf(wchar_t *ostr, const wchar_t *fmt, va_list arg_list);
- ```
-
- ### JS and C strings
-
- C "strings" are really just arrays of numbers. An external code page (such as
- ASCII) maps those numbers to characters. K&R defines two types of strings:
- basic character set strings (`char *`) and extended character set strings
- (`wchar_t *`). In contrast, JS has a true string value type.
-
- Unlike in C, JS strings do not treat the null character as an end-of-string
- marker. As a result, characters beyond the first null character will be used.
-
- The JS equivalent of a C extended string would be an array of the individual
- character codes. The C basic string equivalent would involve specifying a code
- page and mapping back. The `codepage` JS library supports common codepages.
-
- While capturing the essence of C strings, using arrays of character codes is not
- idiomatic JS. Few developers leverage this and the downsides far exceed the
- benefits of a more direct translation. The effect can be feigned, as shown in
- the `js2c` code sample at the end of the document.
-
- ### JS Interface
-
- In the absence of a standard output or even a standard concept of a stream, the
- non-string outputs are irrelevant. Similarly there is no JS analogue of wide
- characters. While useful, lack of direct memory management obviates `snprintf`.
- This implementation exports the remaining functions, `sprintf` and `vsprintf`.
-
- Instead of replicating the original C signature and `errno`, functions directly
- return the output string and throw Errors:
-
- ```typescript
- function sprintf(fmt:string, ...args):string;
- function vsprintf(fmt:string, args:Array<any>):string;
- ```
-
- The C functions return the number of characters written to the string, which is
- directly accessible in JS via the `length` property. A direct replica of the
- various string functions are included at the end of the document.
-
- ## Specifier heritage and regular expression
-
- Note: The regular expressions follow perl `/x` style. Whitespace characters
- outside of character classes are ignored. `#` is a comment character and every
- character until the end of the line is ignored. To convert to a standard regex:
-
- ```js
- regex_string.replace(/#.*$/gm,"").replace(/^\s*/gm,"").replace(/\s*\n/gm,"");
- ```
-
- Based on K&R, conversions originally followed the format:
-
- - required: leading `%`
- - optional: `-` (POSIX refers to this as the "flags")
- - optional: positive number or `*` (POSIX "width")
- - optional: period followed by positive number or `*` (POSIX "precision")
- - optional: an `h` or `l` to indicate size of data (POSIX "length")
- - required: character describing output behavior (POSIX "conversion specifier")
-
- This is captured by the regular expression:
-
- ```perl
- /%(?:
- ([-])? # flags (only minus sign)
- (\d+|\*)? # width
- (?:\.(\d+|\*))? # period + precision
- ([hl])? # length
- ([dioxXucsfeEgGp%]) # conversion specifier
- )/x
- ```
-
- Various implementations of `printf` have added different functionality.
-
- ANSI standards up through C99:
-
- - more flags `"+"` `" "` `"0"` `"#"`
- - more lengths `"L"` `"hh"` `"ll"` `"j"` `"z"` `"t"`
- - more conversions `"F"` `"a"` `"A"` `"n"`
-
- The POSIX specification of `printf` added:
-
- - positional parameters to identify argument indices
- - more flags `"'"`
- - more conversions `"C"` `"S"`
- - clarifications regarding corner cases and "undefined behavior"
-
- BSD implementations added:
-
- - more lengths `"q"`
- - more conversions `"D"` `"U"` `"O"`
-
- glibc (GNU) added:
-
- - more lengths `"Z"`
- - more conversions `"m"`
-
- Windows C Runtime (CRT) added:
-
- - more lengths `"I"` `"I32"` `"I64"` `"w"`
-
- glibc and CRT both added `Z`. glibc uses `Z` for the length `size_t`. CRT uses
- `Z` as a conversion for length-prefixed strings. This implementation takes the
- former approach, handling `Z` in the same way as `z`.
-
- BSD and IBM C library both added `D`. BSD uses `D` as a conversion, namely as
- an alias of `ld`. IBM uses `D` for the length for `_Decimal64`, a decimal
- floating point type, in accordance with ISO/IEC TR 24732. This implementation
- takes the former approach.
-
- This implementation also adds new conversions:
-
- - `"b"` and `"B"` for binary (base-2) integer renderings
- - `"y"` and `"Y"` for true/false and yes/no Boolean conversions
- - `"J"` for JSON
- - `"T"` and `"V"` for JS typeof and valueOf inspection
-
- Combining the various extensions yields the following regular expression:
-
- ```perl
- /%(?:
- %| # literal %% (flags etc prohibited)
- ([1-9]\d*\$)? # positional parameter
- ([-+ 0\x23\x27]*)? # flags
- ([1-9]\d*|\*(?:[1-9]\d*\$)?)? # width
- (?:\.(\d+|\*(?:[1-9]\d*\$)?))? # precision
- (hh?|ll?|[LzjtqZIw])? # length
- ([diouxXfFeEgGaAcCsSpnDUOmbByYJVT]) # conversion specifier
- )/x
- ```
- This implementation explicitly does not support certain non-standard extensions:
-
- - AltiVec vector length extensions (`v` with `h`/`l`/`ll`):
- - CRT fixed width lengths `I32` and `I64`
-
- ## Conversion Specifier Quick Reference Table
-
- | C | Type | Summary |
- |-----|:--------:|-------------------------------------------------------------|
- | `a` | floating | base-2 exp form w/ hex mantissa and dec exponent, lowercase |
- | `A` | floating | base-2 exp form w/ hex mantissa and dec exponent, uppercase |
- | `b` | extended | cast to C `unsigned int`, standard form binary |
- | `B` | extended | cast to C `unsigned long`, standard form binary |
- | `c` | text | print `latin-1` char from number OR first char of string |
- | `C` | text | print `UCS-2` char from number OR first char of string |
- | `d` | integral | cast to C `int`, standard form decimal |
- | `D` | integral | cast to C `long`, standard form decimal |
- | `e` | floating | base-10 exp form w/dec mantissa and dec exponent, lowercase |
- | `E` | floating | base-10 exp form w/dec mantissa and dec exponent, uppercase |
- | `f` | floating | base-10 decimal form, lowercase extended values |
- | `F` | floating | base-10 decimal form, uppercase extended values |
- | `g` | floating | print using `e` or `f` conversion based on value/precision |
- | `G` | floating | print using `E` or `F` conversion based on value/precision |
- | `i` | integral | cast to C `int`, standard form decimal (alias of `d`) |
- | `J` | extended | prints objects using JSON or `util.inspect` |
- | `m` | misc | prints info about Error objects (JS equivalent of `errno`) |
- | `n` | misc | do not print! stores number of chars written to arg `.len` |
- | `o` | integral | cast to C `unsigned int`, standard form octal |
- | `O` | integral | cast to C `unsigned long`, standard form octal |
- | `p` | misc | print `"l"` field of object (fake pointer) |
- | `s` | text | print string argument |
- | `S` | text | print string argument (alias of `"s"`) |
- | `T` | extended | print type information (`typeof` or `Object toString`) |
- | `u` | integral | cast to C `unsigned int`, standard form decimal |
- | `U` | integral | cast to C `unsigned long`, standard form decimal |
- | `V` | extended | print primitive value (`valueOf`) |
- | `x` | integral | cast to C `unsigned int`, standard form hex, lowercase |
- | `X` | integral | cast to C `unsigned long`, standard form hex, uppercase |
- | `y` | extended | prints `true`/`false` or `yes`/`no` based on Boolean value |
- | `Y` | extended | prints `TRUE`/`FALSE` or `YES`/`NO` based on Boolean value |
- | `%` | misc | print the literal `%` character |
-
- ## Parameter Selection
-
- The default behavior is to consume arguments in order:
-
- ```C
- printf("Count to 3: %d %d %d", 1, 2, 3); // Count to 3: 1 2 3
- ```
-
- POSIX `printf` permits explicit argument selection, bypassing the standard
- behavior of using the arguments in order. To select the `n`-th argument, use
- `n$` immediately after the `%` token to select an argument for the conversion:
-
- ```C
- printf("%d %d %d", 1, 2, 3); // 1 2 3 (implicit order 1, 2, 3 )
- printf("%1$s %2$s %3$s", "a", "b", "c"); // a b c (explicit order 1, 2, 3 )
- printf("%1$s %3$s %2$s", "a", "b", "c"); // a c b (explicit order 1, 3, 2 )
- ```
-
- The POSIX standard asserts that mixing positional and non-positional conversions
- is undefined behavior. This implementation handles mixing by tracking the index
- for non-positional conversions:
-
- ```C
- printf("%s %4$s %s %5$s %s", "a", "b", "c", "d", "e"); // a d b e c
- ```
-
- The POSIX standard requires that if an argument is used in the format, every
- preceding argument must be used. This implementation relaxes that requirement:
-
- ```C
- printf("%3$s", "a", "b", "c"); // c (technically invalid since "a"/"b" unused)
- ```
-
- ## Dynamic Specifiers
-
- The width and precision specifiers may include the dynamic specifier `*` which
- instructs the engine to read the next argument (assumed to be an integer). Just
- as with the positional parameter, `idx$` immediately after the `*` token selects
- the numeric argument.
-
- For example:
-
- ```C
- printf("|%5s|", "sheetjs"); // |sheetjs| (width = 5)
- printf("|%*s|", 5, "sheetjs"); // |sheetjs| (width first argument)
- printf("|%2$*1$s|", 5, "sheetjs", 10); // |sheetjs| (width is argument #1)
-
- printf("|%10s|", "sheetjs"); // | sheetjs| (width = 10)
- printf("|%2$*3$s|", 5, "sheetjs", 10); // | sheetjs| (width is argument #3)
- ```
-
- Arguments are generally consumed in order as presented in the format string:
-
- ```C
- printf("|%s|", val);
- printf("|%*s|", width, val);
- printf("|%.*s|", prec, val);
- printf("|%*.*s|", width, prec, val);
- printf("|%0*.*d|", 4, 2, 1); // | 01| width=4 prec=2 value=1
- ```
-
- Positional arguments can be applied to width and precision:
-
- ```C
- printf("|%*.*d|", width, prec, val);
- printf("|%2$0*3$.*1$d|", prec, val, width);
- printf("|%0*.*d|", 4, 2, 1); // | 01| width=4 prec=2 value=1 flags='0'
- printf("|%1$0*3$.*2$d|", 1, 2, 4); // | 01| width=4 prec=2 value=1 flags='0'
- ```
-
- A negative width is interpreted as the `-` flag with a positive width:
-
- ```C
- printf("|%*.*d|", 4, 2, 1); // | 01| width=4 prec=2 value=1 flags=''
- printf("|%-*.*d|", 4, 2, 1); // |01 | width=4 prec=2 value=1 flags='-'
- printf("|%*.*d|", -4, 2, 1); // |01 | width=4 prec=2 value=1 flags='-'
- printf("|%-*.*d|", -4, 2, 1); // |01 | width=4 prec=2 value=1 flags='-'
- ```
-
- A negative precision is discarded:
-
- ```C
- printf("|%*s|", 4, "sheetjs"); // |sheetjs| width=4
- printf("|%*.*s|", 4, 3, "sheetjs"); // | she| width=4 prec=3
- printf("|%*.*s|", 4, 2, "sheetjs"); // | sh| width=4 prec=2
- printf("|%*.*s|", 4, 1, "sheetjs"); // | s| width=4 prec=1
- printf("|%*.*s|", 4, 0, "sheetjs"); // | | width=4 prec=0
- printf("|%*.*s|", 4, -1, "sheetjs"); // |sheetjs| width=4 (prec ignored)
- ```
-
-
- # C Data Model
-
- JS has one numeric type `Number` which represents an IEEE754 double-precision
- (64-bit) floating point number. C has a multitude of numeric types, including
- floating point as well as integer types. The sizes of those data types are
- implementation-dependent. A "C data model" specifies the sizes of the core C
- data types.
-
- ### Integer Types
-
- POSIX `printf` specification references 8 integer types in integer conversions:
-
- | C data type | fmt | unsigned type | fmt | signed type | fmt |
- |-------------|------:|----------------------|------:|---------------|------:|
- | `char` | | `unsigned char` | `hhu` | `signed char` | `hhd` |
- | `short` | `hd` | `unsigned short` | `hu` | | |
- | `int` | `d` | `unsigned int` | `u` | | |
- | `long` | `ld` | `unsigned long` | `lu` | | |
- | `long long` | `lld` | `unsigned long long` | `llu` | | |
- | `size_t` | `zu` | | | `ssize_t` | `zd` |
- | `intmax_t` | `jd` | `uintmax_t` | `ju` | | |
- | `ptrdiff_t` | `td` | | | | |
-
- C99 does not officially define a signed `size_t` or unsigned `ptrdiff_t` type.
- POSIX does define `ssize_t` but no equivalent `uptrdiff_t`.
-
- BSD additionally recognizes the types `quad_t` and `u_quad_t`, which this
- implementation treats as `long long int` and `unsigned long long int`.
-
- ### Character and String Types
-
- Two integer types are used in character and string conversions:
-
- | type | fmt |
- |-------------|------:|
- | `wchar_t` | `ls` |
- | `wint_t` | `lc` |
-
- Both wide types `wchar_t` and `wint_t` can be signed or unsigned according to
- C99. Both types are used only in character and string conversions. Based on
- K&R "printable characters are always positive", the types are assumed unsigned.
-
- ### Floating Point Number Types
-
- K&R recognizes 3 floating point types. C99 later tied it to IEC 60559:
-
- | C data type | precision | total bits | exponent | mantissa | fmt |
- |:--------------|:----------|:----------:|:--------:|:--------:|------:|
- | `float` | single | `32` | `8` | `23` | |
- | `double` | double | `64` | `11` | `52` | `f` |
- | `long double` | extended | `80` | `15` | `64` | `Lf` |
-
- ## Implementation
-
- Numerous "C data models", specifying the bit/byte sizes of the various types,
- have been and continue to be used. For example, OSX and other modern 64-bit
- UNIX flavors use the "LP64" C data model. 64-bit Windows currently uses the
- "LLP64" model. 32-bit systems generally use the "ILP32" model. The 8-bit byte
- sizes for the various types under the various models are defined in ctypes.json
- in the `Models` object as per the following table:
-
- | type | ctypes.json | LP64 | ILP32 | LLP64 |
- |-------------|-------------|-----:|------:|------:|
- | `char` | `char` | 1 | 1 | 1 |
- | `short` | `short` | 2 | 2 | 2 |
- | `int` | `int` | 4 | 4 | 4 |
- | `long` | `long` | 8 | 4 | 4 |
- | `long long` | `longlong` | 8 | 8 | 8 |
- | `wchar_t` | `wchar_t` | 4 | 4 | 2 |
- | `wint_t` | `wint_t` | 4 | 4 | 2 |
- | `size_t` | `size_t` | 8 | 4 | 8 |
- | `intmax_t` | `intmax_t` | 8 | 8 | 8 |
- | `ptrdiff_t` | `ptrdiff_t` | 8 | 4 | 8 |
-
- By default the source assumes the LP64 data model. Other data models are
- supported in the source tree, controlled by the JSFLAGS variable in the build
- process. Set the `JS_MODEL` variable to the desired index as specified in the
- `ModelNames` array in `bits/ctype.json`:
-
- ```bash
- $ <bits/ctypes.json jq -r '.ModelNames|.[]' # LP64 ILP32 LLP64
- $ JSFLAGS=-DJS_MODEL=0 make # LP64
- $ JSFLAGS=-DJS_MODEL=1 make # ILP32
- $ JSFLAGS=-DJS_MODEL=2 make # LLP64
- ```
-
- To create a custom model, add the spec to `bits/ctypes.json` by appending the
- model name to the end of the `ModelNames` array and adding an entry to the
- `Models` object. The current models are defined as follows:
-
- ```json
- {
- "ModelNames":["LP64", "ILP32", "LLP64"],
- "Models": {
- "LP64": { "char":1, "short":2, "int":4, "long":8, "longlong":8, "wint_t":4, "wchar_t":4, "size_t":8, "intmax_t":8, "ptrdiff_t":8 },
- "ILP32": { "char":1, "short":2, "int":4, "long":4, "longlong":8, "wint_t":4, "wchar_t":4, "size_t":4, "intmax_t":8, "ptrdiff_t":4 },
- "LLP64": { "char":1, "short":2, "int":4, "long":4, "longlong":8, "wint_t":2, "wchar_t":2, "size_t":8, "intmax_t":8, "ptrdiff_t":8 }
- }
- }
- ```
-
- # Integer Conversions
-
- This section covers the conversions `diouxXDUO`. The base-2 conversions `bB`
- are an extension and are discussed at the end, but the same basic rules apply.
-
- JS has one Number type (representing an IEEE754 8-byte floating point number)
- that is capable of representing a 32-bit integer. It cannot represent the full
- range of 64-bit integers exactly. Care is taken to avoid operations that may
- inadvertently result in a conversion to a smaller integral type.
-
- ## Restricting Integer Values
-
- JS Bitwise operations convert numbers to 32-bit integers before performing
- operations. With the exception of the unsigned right shift operator `>>>`, all
- operations act on signed integers. For example:
-
- ```js
- Math.pow(2,31) | 0; // -2147483648 == -Math.pow(2,31)
- (Math.pow(2,32)-2) ^ 0; // -2
- -1 >>> 0 // 4294967295 == Math.pow(2,32) - 1
- ```
-
- JS Number can exactly represent every integer in the range `-2^53 .. 2^53`. For
- lengths exceeding 32 bits, `Math.round` is appropriate.
-
- | bits | unsigned | signed |
- |------|---------------------------|-------------------------------------------|
- | 8 | `V & 0xFF` | `V &= 0xFF; if(V > 0x7F) V-= 0x100` |
- | 16 | `V & 0xFFFF` | `V &= 0xFFFF; if(V > 0x7FFF) V-= 0x10000` |
- | 32 | `V >>> 0` | `V | 0` |
- | 64 | `Math.abs(Math.round(V))` | `Math.round(V)` |
-
- ## Length Specifiers for Integer Conversions
-
- When a length specifier implies a certain size (such as `hh` for a single-byte
- integer), the number will be converted before rendering strings. For example:
-
- ```C
- printf("%1$02hhx %1$02hx %1$02lx %1$02llx", 256); // |00 100 100 100|
- printf("%1$02hhx %1$02hx %1$02lx %1$02llx", 4096); // |00 1000 1000 1000|
- printf("%1$02hhx %1$02hx %1$02lx %1$02llx", 65536); // |00 00 10000 10000|
- ```
-
- Values are restricted by first limiting the result to a specified number of
- bytes (appropriate bit-and) and then adding or subtracting to ensure the value
- is signed or unsigned according to the conversion specifier. If a length is
- specified, it overrides the implied length of the conversion. The following
- table describes the behavior of this implementation:
-
- | implied C type | ctypes.json | length | conv default |
- |:------------------------------------|:------------|:------:|:-------------|
- | `int` or `unsigned int` | `int` | (none) | d i o u x X |
- | `char` or `unsigned char` | `char` | hh |
- | `short` or `unsigned short` | `short` | h |
- | `long` or `unsigned long` | `long` | l | D U O |
- | `long long` or `unsigned long long` | `longlong` | L ll q |
- | `intmax_t` or `uintmax_t` | `intmax_t` | j |
- | `size_t` or `ssize_t` | `size_t` | z Z |
- | `ptrdiff_t` or unsigned variant | `ptrdiff_t` | t |
-
- ## Rendering Unsigned Integers in Base 10 ("u" and "U" conversions)
-
- `num.toString(10)` produces the correct result for exact integers.
-
- `"u"` conversion restricts values to `int`; `"U"` restricts to `long`.
-
- ## Rendering Unsigned Integers in Base 8 ("o" and "O" conversions)
-
- Even though `num.toString(8)` is implementation-dependent, all browser
- implementations use standard form for integers in the exact range.
-
- The alternate form (`#`) prints a `"0"` prefix.
-
- `"o"` conversion restricts values to `int`; `"O"` restricts to `long`.
-
- ## Rendering Unsigned Integers in Base 16 ("x" and "X" conversions)
-
- Even though `num.toString(16)` is implementation-dependent, all browser
- implementations use standard form for integers in the exact range.
-
- The alternate form (`#`) prints a `"0x"` or `"0X"` prefix.
-
- Unlike `"U" "O" "D"`, `"X"` conversion uses `A-F` instead of `a-f` in hex.
-
- ## Rendering Signed Integers in Base 10 ("d" "i" and "D" conversions)
-
- `num.toString(10)` produces the correct result for exact integers. The flags
- `" +"` control prefixes for positive integers.
-
- `"di"` conversions restrict values to `int`; `"D"` restricts to `long`.
-
-
- # Floating Point Conversions
-
- This section covers the conversions `fFeEgGaA`.
-
- Due to C variadic argument promotion rules, `float` types are always promoted to
- `double`. None of the conversions or length specifiers signal that an argument
- is to be interpreted as a `float`. There is no JS canonical representation of
- an extended floating point number, so JS `Number` suffices.
-
- ## Infinity, NaN, and Negative Zero
-
- JS recognizes a few special IEEE754 values, as described in the following table:
-
- | JS value | JS Expression | Description |
- |------------:|:--------------|:-----------------------------------------------|
- | `Infinity` | `1./0.` | Positive limiting value `lim{x->0+} 1/x` |
- | `-Infinity` | `-1./0.` | Negative limiting value `lim{x->0+} -1/x` |
- | `NaN` | `0./0.` | Placeholder for "not-a-number" e.g. `0./0.` |
- | `-0.` | `-1/Infinity` | Negative limiting value `lim{x->0-} x` |
-
- JS `Number` methods render different strings from the POSIX spec:
-
- | JS value | POSIX string | JS string |
- |------------:|:----------------------------------------------|--------------:|
- | `Infinity` | `"inf" "INF"` or `"infinity" "INFINITY"` | `"Infinity"` |
- | `-Infinity` | `"-inf" "-INF"` or `"-infinity" "-INFINITY"` | `"-Infinity"` |
- | `NaN` | `"[-]nan" "[-]NAN"` w/opt parenthesized chars | `"NaN"` |
- | `-0.` | uses negative sign (e.g. `"-0"` under `"%f"`) | same as `+0.` |
-
- This implementation performs the required adjustments.
-
- ## Exponential Form ("e" and "E" conversions)
-
- Aside from the special cases discussed above, JS `num.toExponential(prec)`
- differs from POSIX `printf("%1$.*2$e", num, prec)` in the exponent field: JS
- writes exponents with the fewest digits (POSIX requires 2+ digits). This is
- easily fixed by inspecting the output string and inserting a "0" when needed.
-
- The optional `#` flag forces the decimal point to appear when precision is 0.
- This is also easily corrected by adding a decimal point just before the "e".
-
- ## Standard Form ("f" and "F" conversions)
-
- The POSIX spec only requires that the number of digits after the decimal point
- is equal to the precision. It does not specify how many digits appear before
- the decimal point, nor does it specify how to handle numbers that cannot be
- exactly represented.
-
- For values less than `1e21` the JS `num.toFixed(n)` generally matches `%f` with
- the specified precision. However, for larger values `toFixed` defaults to the
- exponential form.
-
- ## Value-dependent Form ("g" and "G" conversions)
-
- The final form (exponential or standard) is determined based on the value. The
- threshold is different from the JS `toString` / `toPrecision` thresholds and
- depends on the specified precision as well as the base-10 exponent:
-
- | Value | `"%.3g"` | `toPrecision(3)` |
- |----------:|:-----------|:-----------------|
- | 1.2345e-4 | `0.000123` | `0.000123` |
- | 1.2345e-5 | `1.23e-05` | `0.0000123` |
- | 1.2345e-6 | `1.23e-06` | `0.00000123` |
- | 1.2345e-7 | `1.23e-07` | `1.23e-7` |
-
- According to JS spec, `toPrecision` uses standard form when `precision > E` and
- `E >= -6`. For printf standard form is used when `precision > E` and `E >= -4`.
-
- ## Hex-Mantissa Decimal-Binary-Exponent Form ("a" and "A" conversions)
-
- A general exponential form involves 3 parameters: radix of the mantissa, base of
- the exponent expression, and radix of the exponent expression. The standard
- exponential form uses decimal for all three parts. For base 16, there are quite
- a few reasonable combinations. Consider the value `1.234567e-80`:
-
- | Mant | Exp Base | Radix-10 (sigil `";"`) | Radix-16 (sigil `";"`) |
- |:----:|:--------:|:-----------------------|:-----------------------|
- | 10 | 10 | `1.234567;-80` | `1.234567;-50` |
- | 16 | 10 | `1.3c0c9539b8887;-80` | `1.3c0c9539b8887;-50` |
- | 16 | 16 | `5.daf8c8f5f4104;-67` | `5.daf8c8f5f4104;-43` |
- | 16 | 4 | `1.76be323d7d041;-133` | `1.76be323d7d041;-85` |
- | 16 | 2 | `1.76be323d7d041;-266` | `1.76be323d7d041;-10a` |
-
- POSIX `"%a"` uses a hex mantissa (16), decimal exponent radix (10), and binary
- exponent base (2). The general normalized form requires that the integral part
- of the mantissa to exceed 0 and not to exceed `exponent base - 1` except in the
- special case of `0`. The sigil is `p` and exponent sign is always used.
-
- JS `num.toString(radix)` is implementation-dependent for valid non-10 radices
- (`2-9, 11-36`). IE uses hex-mantissa decimal-hex-exponent form when the
- absolute value of the base-2 exponent exceeds 60. Otherwise, IE uses an exact
- standard hexadecimal form. Chrome, Safari and other browsers always use the
- exact standard hexadecimal form. Both forms are easily converted to `"%a"` by
- calculating and dividing by the appropriate power of 2.
-
- For each non-zero normal floating point value, there are 4 acceptable strings
- that represent the value, derived by multiplying the normalized value by powers
- of 2 and adjusting the exponent accordingly:
-
- | Value | Normalized | Alternate `*2` | Alternate `*4` | Alternate `*8` |
- |:--------|:---------------|:---------------|:---------------|:---------------|
- | `1` | `1p+0` | `2p-1` | `4p-2` | `8p-3` |
- | `.2` | `1.9999999p-3` | `3.3333333p-4` | `6.6666666p-5` | `c.cccccccp-6` |
- | `.69` | `1.6147ae1p-1` | `2.c28f5c2p-2` | `5.851eb85p-3` | `b.0a3d70ap-4` |
- | `6.e20` | `1.043561p+69` | `2.086ac3p+68` | `4.10d586p+67` | `8.21ab0dp+66` |
-
- JS engines follow the glibc model: multiply by a suitable power of 16 so that
- the mantissa is between 1 and 16, render left to right one digit at a time, then
- fix the result at the end. FreeBSD and OSX always show the normalized form.
- This implementation defaults to the normalized form. To switch to the glibc
- form, define `DO_NOT_NORMALIZE` in the `JSFLAGS` variable when building:
-
- ```bash
- $ JSFLAGS=-DDO_NOT_NORMALIZE make
- ```
-
- # Character Conversions
-
- This section covers the conversions `sScC`.
-
- ## Rendering Strings ("s" and "S" conversions)
-
- JS has no concept of "wide strings" (`wchar_t *` in C), so the length modifiers
- are ignored. `s` and `S` are treated as equivalent.
-
- Arguments are first interpreted as strings by calling the `String` function.
- Implementing `toString` on the argument to be converted may lead to unexpected
- results:
-
- ```C
- var O = {valueOf:function() {return 456;}, toString:function() {return "123"}};
- printf("%1$s %1$d", O); // "123 456"
- ```
-
- If a positive precision is specified, up to that many characters will be taken
- from the string. Otherwise the entire string will be used:
-
- ```C
- printf("|%s|", "sheetjs"); // '|sheetjs|' (no precision)
- printf("|%.9s|", "sheetjs"); // '|sheetjs|' (string shorter than precision)
- printf("|%.5s|", "sheetjs"); // '|sheet|' (string truncated)
- ```
-
- Lengths are measured using the JS string length accessor. Since there is no
- attempt to correct for multi-character sequences like combining marks, the
- results may be unexpected:
-
- ```C
- printf("%.1s","ñ"); // 'n' not "ñ"
- ```
-
- If the width is specified and is greater than the width of the string to be
- rendered, padding will be applied. If the `"-"` flag is specified, then the
- string will be right-padded, otherwise it will be left-padded. If the `"0"`
- flag is specified, the final string is left-padded with zeroes. The `"-"` flag
- takes precedence over `0`.
-
- ```C
- printf( "|%s|", "sheetjs"); // '|sheetjs|' (no width)
- printf( "|%5s|", "sheetjs"); // '|sheetjs|' (string longer than width)
- printf( "|%9s|", "sheetjs"); // '| sheetjs|' (no flag = left pad spaces)
- printf( "|%09s|", "sheetjs"); // '|00sheetjs|' ("0" = left pad "0")
- printf( "|%-9s|", "sheetjs"); // '|sheetjs |' ("-" = right pad space)
- printf("|%-09s|", "sheetjs"); // '|sheetjs |' ("0" ignored)
- ```
-
- ## Rendering Characters ("c" and "C" conversions)
-
- JS has no concept of "wide characters" (`wchar_t` in C). The length modifier is
- used in determining whether the number should be interpreted as one or two
- 16-bit character codes (when the "C" format or the "l" or "ll" specifiers are
- used) or a single 8-bit char code. Precision and flags are ignored.
-
- # Non-Numeric Conversions
-
- ## The literal "%" symbol ("%" conversion)
-
- All other parameters are ignored.
-
- ## Interpreting and Rendering Pointers ("p" conversion)
-
- JS has no true concept of pointers. In array and typed array contexts, it is
- common to associate a position object that stores the address relative to the
- start of the array. This implementation reads the `l` key and interprets as a
- 32-bit or 52-bit unsigned integer depending on `size_t` in the data model.
-
- The normal output format is equivalent to `"%#x"` but the alternate form emits
- using the `"%d"` format. When the pointer is invalid, `-1` is rendered. Only
- the `"#"` flag is interpreted.
-
- ```js
- var x = {}, y = {l:3};
- printf("%1$p %1$#p", y); // 0x3 3
- printf("%1$p %1$#p", x); // 0xFFFFFFFF -1
- ```
-
- ## Extracting length of a partial conversion ("n" conversion)
-
- C `printf` permits a special `n` conversion which interprets the argument as an
- integral pointer (interpreted size controlled by the length specifier) and
- writes the number of characters printed to that pointer.
-
- JS has no true concept of pointers in the C sense. The library works around
- the limitation by interpreting the argument as an object and assigning to the
- `len` key. The conversion does not write any characters to the output string:
-
- ```js
- var x = {};
- printf("%1$s %2$J%2$n abc", "foo", x); // "foo {} abc", also sets x.len = 6
- // |........| |......| (6 chars at that point)
- ```
-
- This implementation mutates the object while processing:
-
- ```js
- var x = {};
- printf("%1$s %2$J%2$n %3$s %2$J", "foo", x, "bar"); // 'foo {} bar {"len":6}'
- ```
-
- ## Error messages ("m" conversion)
-
- glibc supports an `m` conversion that does not consume arguments. It renders
- the string `strerror(errno)` where `strerror` is the libc function and `errno`
- is the global error number.
-
- JS has no equivalent of `errno` and no standard JS runtime exposes a similar
- global error variable, so `%m` will write the default message `"Success"`. A
- positional parameter or `#` flag changes the behavior:
-
- | form | position | behavior |
- |:------------:|:--------:|---------------------------------------|
- | main | no | do not read argument, emit "Success" |
- | alt (flag #) | no | read and process next argument |
- | main or alt | yes | read and process specified argument |
-
- In all forms other than `"%m"`, an argument will be processed as follows:
-
- - If the argument is not an instance of an `Error`, emit "Success"
- - If the `message` field is set, emit the error message.
- - If the `errno` field is set, emit "Error number " followed by the errno
- - Otherwise emit "Error " followed by the error interpreted as a String
-
- ```
- var x = new Error("sheetjs");
- x.errno = 69; x.toString = function() { return "SHEETJS"; };
- printf("|%#m|", x); // |sheetjs|
- delete x.message;
- printf("|%#m|", x); // |Error number 69|
- delete x.errno;
- printf("|%#m|", x); // |Error SHEETJS|
- ```
-
- # Extensions
-
- These additional conversions take advantage of unused format characters:
-
- ## Rendering Boolean Values ("y" and "Y" conversions)
-
- Values are converted to Boolean and tested for truthiness. The `Y` rendering
- is the uppercase version of the equivalent rendering with format `y`.
-
-
- | form | truthy value `y` (`Y`) | falsy value `y` (`Y`) |
- |:------------:|:-----------------------:|:---------------------:|
- | main | `true` (`TRUE`) | `false` (`FALSE`) |
- | alt (flag #) | `yes` (`YES`) | `no` (`NO`) |
-
- Width and precision are applied in the same manner as the `s` conversion.
-
- ```js
- printf("|%1$y|%2$Y|%1$#Y|%2$#y|%2$.1y|", 1, 0); // |true|FALSE|YES|no|f|
- printf("|%05.2Y|%-5.2y|", 1, 0); // |000TR|fa |
- ```
-
- ## Rendering JSON ("J" conversion)
-
- The default rendering is the standard output from `JSON.stringify`. Alternate
- form (`"#"` flag) renders using `util.inspect` if available.
-
- ```js
- var x = {
- a: [1,[2,3,4],5,6,7],
- b: {
- c: {
- d: { e:"f" },
- g:"h",
- i:"j"
- },
- k:"l",
- m:"n",
- o:"p"},
- q: "r"
- };
- printf("%J", x) // '{"a":[1,[2,3,4],5,6,7],"b":{"c":{"d":{"e":"f"}, ..(ctnd)..
- printf("%#J", x) // '{ a: [ 1, [ 2, 3, 4 ], 5, 6, 7 ],\n b: { c: { ..(ctnd)..
- ```
-
- Width, precision and other flags are ignored.
-
- ## JS typeof and valueOf ("T" and "V" conversion)
-
- Under the "T" conversion, the result of `typeof arg` is rendered. If the `#`
- flag is specified, the type is derived from `Object.prototype.toString`:
-
- ```
- printf("%1$T %1$#T", 1); // 'number Number'
- printf("%1$T %1$#T", 'foo'); // 'string String'
- printf("%1$T %1$#T", [1,2,3]); // 'object Array'
- printf("%1$T %1$#T", null); // 'object Null'
- printf("%1$T %1$#T", undefined); // 'undefined Undefined'
- ```
-
- Under the "V" conversion, the result of `arg.valueOf()` is rendered:
-
- ```
- var _f = function() { return "f"; };
- var _3 = function() { return 3; };
- printf("%1$d %1$s %1$V", {toString:_f}); // '0 f f'
- printf("%1$d %1$s %1$V", {valueOf:_3}); // '3 [object Object] 3'
- printf("%1$d %1$s %1$V", {valueOf:_3, toString:_f}); // '3 f 3'
- ```
-
- ## Rendering Unsigned Integers in Base 2 ("b" and "B" conversions)
-
- The implementation is similar to the octal `"o"` and `"O"` conversions, except
- for the radix (2 for `"b"` and `"B"`) and the alternate-form prefix (`"0b"`)
-
- # Miscellaneous Notes
-
- ## Format Characters
-
- For compatibility purposes, format characters must be printable ASCII characters
- (ASCII codes `0x20 - 0x7E`). The 95 eligible characters are listed below:
-
- | C | Type | C | Type | C | Type | C | Type |
- |-----|:----------:|-----|:----------:|-----|:----------:|-----|:----------:|
- | `a` | conversion | `A` | conversion | ` ` | flag | `!` | |
- | `b` | conversion | `B` | conversion | `"` | | `#` | flag |
- | `c` | conversion | `C` | conversion | `$` | other | `%` | conversion |
- | `d` | conversion | `D` | conversion | `&` | | `'` | flag |
- | `e` | conversion | `E` | conversion | `(` | | `)` | |
- | `f` | conversion | `F` | conversion | `*` | other | `+` | flag |
- | `g` | conversion | `G` | conversion | `,` | | `-` | flag |
- | `h` | length | `H` | | `.` | other | `/` | |
- | `i` | conversion | `I` | length | `0` | digit | `1` | digit |
- | `j` | length | `J` | conversion | `2` | digit | `3` | digit |
- | `k` | | `K` | | `4` | digit | `5` | digit |
- | `l` | length | `L` | length | `6` | digit | `7` | digit |
- | `m` | conversion | `M` | | `8` | digit | `9` | digit |
- | `n` | conversion | `N` | | `:` | | `;` | |
- | `o` | conversion | `O` | conversion | `<` | | `=` | |
- | `p` | conversion | `P` | | `>` | | `?` | |
- | `q` | length | `Q` | | `@` | | `[` | |
- | `r` | | `R` | | `\` | | `]` | |
- | `s` | conversion | `S` | conversion | `^` | | `_` | |
- | `t` | length | `T` | conversion | `~` | | `{` | |
- | `u` | conversion | `U` | conversion | `|` | | `}` | |
- | `v` | | `V` | conversion | `` ` `` | |
- | `w` | length | `W` | |
- | `x` | conversion | `X` | conversion |
- | `y` | conversion | `Y` | conversion |
- | `z` | length | `Z` | length |
-
- ## JS and C strings
-
- C provides no guidance on the actual character set. According to K&R all valid
- characters in source code must be in a character set that is a subset of the
- 7-bit ASCII set. This implementation falls back on the UTF-16 base required by
- JS. When converting C literal strings, there are a few differences in escaping:
-
- | C escape sequence | Equivalent JS | Notes |
- |:------------------|:--------------|:---------------------------------------|
- | `"\a"` | `"\007"` | BEL character will not ring in browser |
- | `"\?"` | `"?"` | JS does not handle trigraphs |
- | `"\ooo"` (octal) | `"\ooo"` | JS uses Latin-1 for non-ASCII codes |
- | `"\xhh"` (hex) | `"\xhh"` | JS uses Latin-1 for non-ASCII codes |
-
- ## Browser Deviations
-
- Opera does not always include the last significant digit in base 16 rendering.
- For example, `(-6.9e-11).toString(16)` is `"0.000000004bddc5fd160168"` in every
- other browser but is `"0.000000004bddc5fd16017"` in Opera. The test suite skips
- the `%a/%A` precision-less formats in Opera.
-
- `Object.prototype.toString.call` gives unexpected results in older browsers, and
- no attempt is made to correct for them. The test suite ignores those cases:
-
- | value | `%#T` expected | `%#T` IE < 9 | `%#T` Android < 4.4 |
- |:------------|:---------------|:-------------|:--------------------|
- | `null` | `"Null"` | `"Object"` | `"global"` |
- | `undefined` | `"Undefined"` | `"Object"` | `"global"` |
-
- ## Support Summary
-
- - Full [POSIX](http://pubs.opengroup.org/onlinepubs/9699919799/functions/printf.html) conversion support with extensions!
- [Conversion Specifier Table](#conversion-specifier-quick-reference-table)
- - Full support for POSIX flags and positional parameters
- - Emulation of BSD `quad_t` and `u_quad_t` conversion
- - Parser accepts but does not emulate CRT wide and unicode character conversions
- - glibc `Z` length conversion and extended `m` error support
- - Parser fails on CRT `I32`/`I64` fixed lengths
- - Default `LP64` data model but can be configured to support `ILP32` or `LLP64`
-
|