git.sesse.net Git - pistorm/blob - raylib/external/stb_image.h

   1 /* stb_image - v2.26 - public domain image loader - http://nothings.org/stb
   2                                   no warranty implied; use at your own risk
   3
   4    Do this:
   5       #define STB_IMAGE_IMPLEMENTATION
   6    before you include this file in *one* C or C++ file to create the implementation.
   7
   8    // i.e. it should look like this:
   9    #include ...
  10    #include ...
  11    #include ...
  12    #define STB_IMAGE_IMPLEMENTATION
  13    #include "stb_image.h"
  14
  15    You can #define STBI_ASSERT(x) before the #include to avoid using assert.h.
  16    And #define STBI_MALLOC, STBI_REALLOC, and STBI_FREE to avoid using malloc,realloc,free
  17
  18
  19    QUICK NOTES:
  20       Primarily of interest to game developers and other people who can
  21           avoid problematic images and only need the trivial interface
  22
  23       JPEG baseline & progressive (12 bpc/arithmetic not supported, same as stock IJG lib)
  24       PNG 1/2/4/8/16-bit-per-channel
  25
  26       TGA (not sure what subset, if a subset)
  27       BMP non-1bpp, non-RLE
  28       PSD (composited view only, no extra channels, 8/16 bit-per-channel)
  29
  30       GIF (*comp always reports as 4-channel)
  31       HDR (radiance rgbE format)
  32       PIC (Softimage PIC)
  33       PNM (PPM and PGM binary only)
  34
  35       Animated GIF still needs a proper API, but here's one way to do it:
  36           http://gist.github.com/urraka/685d9a6340b26b830d49
  37
  38       - decode from memory or through FILE (define STBI_NO_STDIO to remove code)
  39       - decode from arbitrary I/O callbacks
  40       - SIMD acceleration on x86/x64 (SSE2) and ARM (NEON)
  41
  42    Full documentation under "DOCUMENTATION" below.
  43
  44
  45 LICENSE
  46
  47   See end of file for license information.
  48
  49 RECENT REVISION HISTORY:
  50
  51       2.26  (2020-07-13) many minor fixes
  52       2.25  (2020-02-02) fix warnings
  53       2.24  (2020-02-02) fix warnings; thread-local failure_reason and flip_vertically
  54       2.23  (2019-08-11) fix clang static analysis warning
  55       2.22  (2019-03-04) gif fixes, fix warnings
  56       2.21  (2019-02-25) fix typo in comment
  57       2.20  (2019-02-07) support utf8 filenames in Windows; fix warnings and platform ifdefs
  58       2.19  (2018-02-11) fix warning
  59       2.18  (2018-01-30) fix warnings
  60       2.17  (2018-01-29) bugfix, 1-bit BMP, 16-bitness query, fix warnings
  61       2.16  (2017-07-23) all functions have 16-bit variants; optimizations; bugfixes
  62       2.15  (2017-03-18) fix png-1,2,4; all Imagenet JPGs; no runtime SSE detection on GCC
  63       2.14  (2017-03-03) remove deprecated STBI_JPEG_OLD; fixes for Imagenet JPGs
  64       2.13  (2016-12-04) experimental 16-bit API, only for PNG so far; fixes
  65       2.12  (2016-04-02) fix typo in 2.11 PSD fix that caused crashes
  66       2.11  (2016-04-02) 16-bit PNGS; enable SSE2 in non-gcc x64
  67                          RGB-format JPEG; remove white matting in PSD;
  68                          allocate large structures on the stack;
  69                          correct channel count for PNG & BMP
  70       2.10  (2016-01-22) avoid warning introduced in 2.09
  71       2.09  (2016-01-16) 16-bit TGA; comments in PNM files; STBI_REALLOC_SIZED
  72
  73    See end of file for full revision history.
  74
  75
  76  ============================    Contributors    =========================
  77
  78  Image formats                          Extensions, features
  79     Sean Barrett (jpeg, png, bmp)          Jetro Lauha (stbi_info)
  80     Nicolas Schulz (hdr, psd)              Martin "SpartanJ" Golini (stbi_info)
  81     Jonathan Dummer (tga)                  James "moose2000" Brown (iPhone PNG)
  82     Jean-Marc Lienher (gif)                Ben "Disch" Wenger (io callbacks)
  83     Tom Seddon (pic)                       Omar Cornut (1/2/4-bit PNG)
  84     Thatcher Ulrich (psd)                  Nicolas Guillemot (vertical flip)
  85     Ken Miller (pgm, ppm)                  Richard Mitton (16-bit PSD)
  86     github:urraka (animated gif)           Junggon Kim (PNM comments)
  87     Christopher Forseth (animated gif)     Daniel Gibson (16-bit TGA)
  88                                            socks-the-fox (16-bit PNG)
  89                                            Jeremy Sawicki (handle all ImageNet JPGs)
  90  Optimizations & bugfixes                  Mikhail Morozov (1-bit BMP)
  91     Fabian "ryg" Giesen                    Anael Seghezzi (is-16-bit query)
  92     Arseny Kapoulkine
  93     John-Mark Allen
  94     Carmelo J Fdez-Aguera
  95
  96  Bug & warning fixes
  97     Marc LeBlanc            David Woo          Guillaume George     Martins Mozeiko
  98     Christpher Lloyd        Jerry Jansson      Joseph Thomson       Blazej Dariusz Roszkowski
  99     Phil Jordan                                Dave Moore           Roy Eltham
 100     Hayaki Saito            Nathan Reed        Won Chun
 101     Luke Graham             Johan Duparc       Nick Verigakis       the Horde3D community
 102     Thomas Ruf              Ronny Chevalier                         github:rlyeh
 103     Janez Zemva             John Bartholomew   Michal Cichon        github:romigrou
 104     Jonathan Blow           Ken Hamada         Tero Hanninen        github:svdijk
 105                             Laurent Gomila     Cort Stratton        github:snagar
 106     Aruelien Pocheville     Sergio Gonzalez    Thibault Reuille     github:Zelex
 107     Cass Everitt            Ryamond Barbiero                        github:grim210
 108     Paul Du Bois            Engin Manap        Aldo Culquicondor    github:sammyhw
 109     Philipp Wiesemann       Dale Weiler        Oriol Ferrer Mesia   github:phprus
 110     Josh Tobin                                 Matthew Gregan       github:poppolopoppo
 111     Julian Raschke          Gregory Mullen     Christian Floisand   github:darealshinji
 112     Baldur Karlsson         Kevin Schmidt      JR Smith             github:Michaelangel007
 113                             Brad Weinberger    Matvey Cherevko      [reserved]
 114     Luca Sas                Alexander Veselov  Zack Middleton       [reserved]
 115     Ryan C. Gordon          [reserved]                              [reserved]
 116                      DO NOT ADD YOUR NAME HERE
 117
 118   To add your name to the credits, pick a random blank space in the middle and fill it.
 119   80% of merge conflicts on stb PRs are due to people adding their name at the end
 120   of the credits.
 121 */
 122
 123 #ifndef STBI_INCLUDE_STB_IMAGE_H
 124 #define STBI_INCLUDE_STB_IMAGE_H
 125
 126 // DOCUMENTATION
 127 //
 128 // Limitations:
 129 //    - no 12-bit-per-channel JPEG
 130 //    - no JPEGs with arithmetic coding
 131 //    - GIF always returns *comp=4
 132 //
 133 // Basic usage (see HDR discussion below for HDR usage):
 134 //    int x,y,n;
 135 //    unsigned char *data = stbi_load(filename, &x, &y, &n, 0);
 136 //    // ... process data if not NULL ...
 137 //    // ... x = width, y = height, n = # 8-bit components per pixel ...
 138 //    // ... replace '0' with '1'..'4' to force that many components per pixel
 139 //    // ... but 'n' will always be the number that it would have been if you said 0
 140 //    stbi_image_free(data)
 141 //
 142 // Standard parameters:
 143 //    int *x                 -- outputs image width in pixels
 144 //    int *y                 -- outputs image height in pixels
 145 //    int *channels_in_file  -- outputs # of image components in image file
 146 //    int desired_channels   -- if non-zero, # of image components requested in result
 147 //
 148 // The return value from an image loader is an 'unsigned char *' which points
 149 // to the pixel data, or NULL on an allocation failure or if the image is
 150 // corrupt or invalid. The pixel data consists of *y scanlines of *x pixels,
 151 // with each pixel consisting of N interleaved 8-bit components; the first
 152 // pixel pointed to is top-left-most in the image. There is no padding between
 153 // image scanlines or between pixels, regardless of format. The number of
 154 // components N is 'desired_channels' if desired_channels is non-zero, or
 155 // *channels_in_file otherwise. If desired_channels is non-zero,
 156 // *channels_in_file has the number of components that _would_ have been
 157 // output otherwise. E.g. if you set desired_channels to 4, you will always
 158 // get RGBA output, but you can check *channels_in_file to see if it's trivially
 159 // opaque because e.g. there were only 3 channels in the source image.
 160 //
 161 // An output image with N components has the following components interleaved
 162 // in this order in each pixel:
 163 //
 164 //     N=#comp     components
 165 //       1           grey
 166 //       2           grey, alpha
 167 //       3           red, green, blue
 168 //       4           red, green, blue, alpha
 169 //
 170 // If image loading fails for any reason, the return value will be NULL,
 171 // and *x, *y, *channels_in_file will be unchanged. The function
 172 // stbi_failure_reason() can be queried for an extremely brief, end-user
 173 // unfriendly explanation of why the load failed. Define STBI_NO_FAILURE_STRINGS
 174 // to avoid compiling these strings at all, and STBI_FAILURE_USERMSG to get slightly
 175 // more user-friendly ones.
 176 //
 177 // Paletted PNG, BMP, GIF, and PIC images are automatically depalettized.
 178 //
 179 // ===========================================================================
 180 //
 181 // UNICODE:
 182 //
 183 //   If compiling for Windows and you wish to use Unicode filenames, compile
 184 //   with
 185 //       #define STBI_WINDOWS_UTF8
 186 //   and pass utf8-encoded filenames. Call stbi_convert_wchar_to_utf8 to convert
 187 //   Windows wchar_t filenames to utf8.
 188 //
 189 // ===========================================================================
 190 //
 191 // Philosophy
 192 //
 193 // stb libraries are designed with the following priorities:
 194 //
 195 //    1. easy to use
 196 //    2. easy to maintain
 197 //    3. good performance
 198 //
 199 // Sometimes I let "good performance" creep up in priority over "easy to maintain",
 200 // and for best performance I may provide less-easy-to-use APIs that give higher
 201 // performance, in addition to the easy-to-use ones. Nevertheless, it's important
 202 // to keep in mind that from the standpoint of you, a client of this library,
 203 // all you care about is #1 and #3, and stb libraries DO NOT emphasize #3 above all.
 204 //
 205 // Some secondary priorities arise directly from the first two, some of which
 206 // provide more explicit reasons why performance can't be emphasized.
 207 //
 208 //    - Portable ("ease of use")
 209 //    - Small source code footprint ("easy to maintain")
 210 //    - No dependencies ("ease of use")
 211 //
 212 // ===========================================================================
 213 //
 214 // I/O callbacks
 215 //
 216 // I/O callbacks allow you to read from arbitrary sources, like packaged
 217 // files or some other source. Data read from callbacks are processed
 218 // through a small internal buffer (currently 128 bytes) to try to reduce
 219 // overhead.
 220 //
 221 // The three functions you must define are "read" (reads some bytes of data),
 222 // "skip" (skips some bytes of data), "eof" (reports if the stream is at the end).
 223 //
 224 // ===========================================================================
 225 //
 226 // SIMD support
 227 //
 228 // The JPEG decoder will try to automatically use SIMD kernels on x86 when
 229 // supported by the compiler. For ARM Neon support, you must explicitly
 230 // request it.
 231 //
 232 // (The old do-it-yourself SIMD API is no longer supported in the current
 233 // code.)
 234 //
 235 // On x86, SSE2 will automatically be used when available based on a run-time
 236 // test; if not, the generic C versions are used as a fall-back. On ARM targets,
 237 // the typical path is to have separate builds for NEON and non-NEON devices
 238 // (at least this is true for iOS and Android). Therefore, the NEON support is
 239 // toggled by a build flag: define STBI_NEON to get NEON loops.
 240 //
 241 // If for some reason you do not want to use any of SIMD code, or if
 242 // you have issues compiling it, you can disable it entirely by
 243 // defining STBI_NO_SIMD.
 244 //
 245 // ===========================================================================
 246 //
 247 // HDR image support   (disable by defining STBI_NO_HDR)
 248 //
 249 // stb_image supports loading HDR images in general, and currently the Radiance
 250 // .HDR file format specifically. You can still load any file through the existing
 251 // interface; if you attempt to load an HDR file, it will be automatically remapped
 252 // to LDR, assuming gamma 2.2 and an arbitrary scale factor defaulting to 1;
 253 // both of these constants can be reconfigured through this interface:
 254 //
 255 //     stbi_hdr_to_ldr_gamma(2.2f);
 256 //     stbi_hdr_to_ldr_scale(1.0f);
 257 //
 258 // (note, do not use _inverse_ constants; stbi_image will invert them
 259 // appropriately).
 260 //
 261 // Additionally, there is a new, parallel interface for loading files as
 262 // (linear) floats to preserve the full dynamic range:
 263 //
 264 //    float *data = stbi_loadf(filename, &x, &y, &n, 0);
 265 //
 266 // If you load LDR images through this interface, those images will
 267 // be promoted to floating point values, run through the inverse of
 268 // constants corresponding to the above:
 269 //
 270 //     stbi_ldr_to_hdr_scale(1.0f);
 271 //     stbi_ldr_to_hdr_gamma(2.2f);
 272 //
 273 // Finally, given a filename (or an open file or memory block--see header
 274 // file for details) containing image data, you can query for the "most
 275 // appropriate" interface to use (that is, whether the image is HDR or
 276 // not), using:
 277 //
 278 //     stbi_is_hdr(char *filename);
 279 //
 280 // ===========================================================================
 281 //
 282 // iPhone PNG support:
 283 //
 284 // By default we convert iphone-formatted PNGs back to RGB, even though
 285 // they are internally encoded differently. You can disable this conversion
 286 // by calling stbi_convert_iphone_png_to_rgb(0), in which case
 287 // you will always just get the native iphone "format" through (which
 288 // is BGR stored in RGB).
 289 //
 290 // Call stbi_set_unpremultiply_on_load(1) as well to force a divide per
 291 // pixel to remove any premultiplied alpha *only* if the image file explicitly
 292 // says there's premultiplied data (currently only happens in iPhone images,
 293 // and only if iPhone convert-to-rgb processing is on).
 294 //
 295 // ===========================================================================
 296 //
 297 // ADDITIONAL CONFIGURATION
 298 //
 299 //  - You can suppress implementation of any of the decoders to reduce
 300 //    your code footprint by #defining one or more of the following
 301 //    symbols before creating the implementation.
 302 //
 303 //        STBI_NO_JPEG
 304 //        STBI_NO_PNG
 305 //        STBI_NO_BMP
 306 //        STBI_NO_PSD
 307 //        STBI_NO_TGA
 308 //        STBI_NO_GIF
 309 //        STBI_NO_HDR
 310 //        STBI_NO_PIC
 311 //        STBI_NO_PNM   (.ppm and .pgm)
 312 //
 313 //  - You can request *only* certain decoders and suppress all other ones
 314 //    (this will be more forward-compatible, as addition of new decoders
 315 //    doesn't require you to disable them explicitly):
 316 //
 317 //        STBI_ONLY_JPEG
 318 //        STBI_ONLY_PNG
 319 //        STBI_ONLY_BMP
 320 //        STBI_ONLY_PSD
 321 //        STBI_ONLY_TGA
 322 //        STBI_ONLY_GIF
 323 //        STBI_ONLY_HDR
 324 //        STBI_ONLY_PIC
 325 //        STBI_ONLY_PNM   (.ppm and .pgm)
 326 //
 327 //   - If you use STBI_NO_PNG (or _ONLY_ without PNG), and you still
 328 //     want the zlib decoder to be available, #define STBI_SUPPORT_ZLIB
 329 //
 330 //  - If you define STBI_MAX_DIMENSIONS, stb_image will reject images greater
 331 //    than that size (in either width or height) without further processing.
 332 //    This is to let programs in the wild set an upper bound to prevent
 333 //    denial-of-service attacks on untrusted data, as one could generate a
 334 //    valid image of gigantic dimensions and force stb_image to allocate a
 335 //    huge block of memory and spend disproportionate time decoding it. By
 336 //    default this is set to (1 << 24), which is 16777216, but that's still
 337 //    very big.
 338
 339 #ifndef STBI_NO_STDIO
 340 #include <stdio.h>
 341 #endif // STBI_NO_STDIO
 342
 343 #define STBI_VERSION 1
 344
 345 enum
 346 {
 347    STBI_default = 0, // only used for desired_channels
 348
 349    STBI_grey       = 1,
 350    STBI_grey_alpha = 2,
 351    STBI_rgb        = 3,
 352    STBI_rgb_alpha  = 4
 353 };
 354
 355 #include <stdlib.h>
 356 typedef unsigned char stbi_uc;
 357 typedef unsigned short stbi_us;
 358
 359 #ifdef __cplusplus
 360 extern "C" {
 361 #endif
 362
 363 #ifndef STBIDEF
 364 #ifdef STB_IMAGE_STATIC
 365 #define STBIDEF static
 366 #else
 367 #define STBIDEF extern
 368 #endif
 369 #endif
 370
 371 //////////////////////////////////////////////////////////////////////////////
 372 //
 373 // PRIMARY API - works on images of any type
 374 //
 375
 376 //
 377 // load image by filename, open file, or memory buffer
 378 //
 379
 380 typedef struct
 381 {
 382    int      (*read)  (void *user,char *data,int size);   // fill 'data' with 'size' bytes.  return number of bytes actually read
 383    void     (*skip)  (void *user,int n);                 // skip the next 'n' bytes, or 'unget' the last -n bytes if negative
 384    int      (*eof)   (void *user);                       // returns nonzero if we are at end of file/data
 385 } stbi_io_callbacks;
 386
 387 ////////////////////////////////////
 388 //
 389 // 8-bits-per-channel interface
 390 //
 391
 392 STBIDEF stbi_uc *stbi_load_from_memory   (stbi_uc           const *buffer, int len   , int *x, int *y, int *channels_in_file, int desired_channels);
 393 STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk  , void *user, int *x, int *y, int *channels_in_file, int desired_channels);
 394
 395 #ifndef STBI_NO_STDIO
 396 STBIDEF stbi_uc *stbi_load            (char const *filename, int *x, int *y, int *channels_in_file, int desired_channels);
 397 STBIDEF stbi_uc *stbi_load_from_file  (FILE *f, int *x, int *y, int *channels_in_file, int desired_channels);
 398 // for stbi_load_from_file, file pointer is left pointing immediately after image
 399 #endif
 400
 401 #ifndef STBI_NO_GIF
 402 STBIDEF stbi_uc *stbi_load_gif_from_memory(stbi_uc const *buffer, int len, int **delays, int *x, int *y, int *z, int *comp, int req_comp);
 403 #endif
 404
 405 #ifdef STBI_WINDOWS_UTF8
 406 STBIDEF int stbi_convert_wchar_to_utf8(char *buffer, size_t bufferlen, const wchar_t* input);
 407 #endif
 408
 409 ////////////////////////////////////
 410 //
 411 // 16-bits-per-channel interface
 412 //
 413
 414 STBIDEF stbi_us *stbi_load_16_from_memory   (stbi_uc const *buffer, int len, int *x, int *y, int *channels_in_file, int desired_channels);
 415 STBIDEF stbi_us *stbi_load_16_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *channels_in_file, int desired_channels);
 416
 417 #ifndef STBI_NO_STDIO
 418 STBIDEF stbi_us *stbi_load_16          (char const *filename, int *x, int *y, int *channels_in_file, int desired_channels);
 419 STBIDEF stbi_us *stbi_load_from_file_16(FILE *f, int *x, int *y, int *channels_in_file, int desired_channels);
 420 #endif
 421
 422 ////////////////////////////////////
 423 //
 424 // float-per-channel interface
 425 //
 426 #ifndef STBI_NO_LINEAR
 427    STBIDEF float *stbi_loadf_from_memory     (stbi_uc const *buffer, int len, int *x, int *y, int *channels_in_file, int desired_channels);
 428    STBIDEF float *stbi_loadf_from_callbacks  (stbi_io_callbacks const *clbk, void *user, int *x, int *y,  int *channels_in_file, int desired_channels);
 429
 430    #ifndef STBI_NO_STDIO
 431    STBIDEF float *stbi_loadf            (char const *filename, int *x, int *y, int *channels_in_file, int desired_channels);
 432    STBIDEF float *stbi_loadf_from_file  (FILE *f, int *x, int *y, int *channels_in_file, int desired_channels);
 433    #endif
 434 #endif
 435
 436 #ifndef STBI_NO_HDR
 437    STBIDEF void   stbi_hdr_to_ldr_gamma(float gamma);
 438    STBIDEF void   stbi_hdr_to_ldr_scale(float scale);
 439 #endif // STBI_NO_HDR
 440
 441 #ifndef STBI_NO_LINEAR
 442    STBIDEF void   stbi_ldr_to_hdr_gamma(float gamma);
 443    STBIDEF void   stbi_ldr_to_hdr_scale(float scale);
 444 #endif // STBI_NO_LINEAR
 445
 446 // stbi_is_hdr is always defined, but always returns false if STBI_NO_HDR
 447 STBIDEF int    stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user);
 448 STBIDEF int    stbi_is_hdr_from_memory(stbi_uc const *buffer, int len);
 449 #ifndef STBI_NO_STDIO
 450 STBIDEF int      stbi_is_hdr          (char const *filename);
 451 STBIDEF int      stbi_is_hdr_from_file(FILE *f);
 452 #endif // STBI_NO_STDIO
 453
 454
 455 // get a VERY brief reason for failure
 456 // on most compilers (and ALL modern mainstream compilers) this is threadsafe
 457 STBIDEF const char *stbi_failure_reason  (void);
 458
 459 // free the loaded image -- this is just free()
 460 STBIDEF void     stbi_image_free      (void *retval_from_stbi_load);
 461
 462 // get image dimensions & components without fully decoding
 463 STBIDEF int      stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp);
 464 STBIDEF int      stbi_info_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp);
 465 STBIDEF int      stbi_is_16_bit_from_memory(stbi_uc const *buffer, int len);
 466 STBIDEF int      stbi_is_16_bit_from_callbacks(stbi_io_callbacks const *clbk, void *user);
 467
 468 #ifndef STBI_NO_STDIO
 469 STBIDEF int      stbi_info               (char const *filename,     int *x, int *y, int *comp);
 470 STBIDEF int      stbi_info_from_file     (FILE *f,                  int *x, int *y, int *comp);
 471 STBIDEF int      stbi_is_16_bit          (char const *filename);
 472 STBIDEF int      stbi_is_16_bit_from_file(FILE *f);
 473 #endif
 474
 475
 476
 477 // for image formats that explicitly notate that they have premultiplied alpha,
 478 // we just return the colors as stored in the file. set this flag to force
 479 // unpremultiplication. results are undefined if the unpremultiply overflow.
 480 STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply);
 481
 482 // indicate whether we should process iphone images back to canonical format,
 483 // or just pass them through "as-is"
 484 STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert);
 485
 486 // flip the image vertically, so the first pixel in the output array is the bottom left
 487 STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip);
 488
 489 // as above, but only applies to images loaded on the thread that calls the function
 490 // this function is only available if your compiler supports thread-local variables;
 491 // calling it will fail to link if your compiler doesn't
 492 STBIDEF void stbi_set_flip_vertically_on_load_thread(int flag_true_if_should_flip);
 493
 494 // ZLIB client - used by PNG, available for other purposes
 495
 496 STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen);
 497 STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header);
 498 STBIDEF char *stbi_zlib_decode_malloc(const char *buffer, int len, int *outlen);
 499 STBIDEF int   stbi_zlib_decode_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
 500
 501 STBIDEF char *stbi_zlib_decode_noheader_malloc(const char *buffer, int len, int *outlen);
 502 STBIDEF int   stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
 503
 504
 505 #ifdef __cplusplus
 506 }
 507 #endif
 508
 509 //
 510 //
 511 ////   end header file   /////////////////////////////////////////////////////
 512 #endif // STBI_INCLUDE_STB_IMAGE_H
 513
 514 #ifdef STB_IMAGE_IMPLEMENTATION
 515
 516 #if defined(STBI_ONLY_JPEG) || defined(STBI_ONLY_PNG) || defined(STBI_ONLY_BMP) \
 517   || defined(STBI_ONLY_TGA) || defined(STBI_ONLY_GIF) || defined(STBI_ONLY_PSD) \
 518   || defined(STBI_ONLY_HDR) || defined(STBI_ONLY_PIC) || defined(STBI_ONLY_PNM) \
 519   || defined(STBI_ONLY_ZLIB)
 520    #ifndef STBI_ONLY_JPEG
 521    #define STBI_NO_JPEG
 522    #endif
 523    #ifndef STBI_ONLY_PNG
 524    #define STBI_NO_PNG
 525    #endif
 526    #ifndef STBI_ONLY_BMP
 527    #define STBI_NO_BMP
 528    #endif
 529    #ifndef STBI_ONLY_PSD
 530    #define STBI_NO_PSD
 531    #endif
 532    #ifndef STBI_ONLY_TGA
 533    #define STBI_NO_TGA
 534    #endif
 535    #ifndef STBI_ONLY_GIF
 536    #define STBI_NO_GIF
 537    #endif
 538    #ifndef STBI_ONLY_HDR
 539    #define STBI_NO_HDR
 540    #endif
 541    #ifndef STBI_ONLY_PIC
 542    #define STBI_NO_PIC
 543    #endif
 544    #ifndef STBI_ONLY_PNM
 545    #define STBI_NO_PNM
 546    #endif
 547 #endif
 548
 549 #if defined(STBI_NO_PNG) && !defined(STBI_SUPPORT_ZLIB) && !defined(STBI_NO_ZLIB)
 550 #define STBI_NO_ZLIB
 551 #endif
 552
 553
 554 #include <stdarg.h>
 555 #include <stddef.h> // ptrdiff_t on osx
 556 #include <stdlib.h>
 557 #include <string.h>
 558 #include <limits.h>
 559
 560 #if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR)
 561 #include <math.h>  // ldexp, pow
 562 #endif
 563
 564 #ifndef STBI_NO_STDIO
 565 #include <stdio.h>
 566 #endif
 567
 568 #ifndef STBI_ASSERT
 569 #include <assert.h>
 570 #define STBI_ASSERT(x) assert(x)
 571 #endif
 572
 573 #ifdef __cplusplus
 574 #define STBI_EXTERN extern "C"
 575 #else
 576 #define STBI_EXTERN extern
 577 #endif
 578
 579
 580 #ifndef _MSC_VER
 581    #ifdef __cplusplus
 582    #define stbi_inline inline
 583    #else
 584    #define stbi_inline
 585    #endif
 586 #else
 587    #define stbi_inline __forceinline
 588 #endif
 589
 590 #ifndef STBI_NO_THREAD_LOCALS
 591    #if defined(__cplusplus) &&  __cplusplus >= 201103L
 592       #define STBI_THREAD_LOCAL       thread_local
 593    #elif defined(__GNUC__) && __GNUC__ < 5
 594       #define STBI_THREAD_LOCAL       __thread
 595    #elif defined(_MSC_VER)
 596       #define STBI_THREAD_LOCAL       __declspec(thread)
 597    #elif defined (__STDC_VERSION__) && __STDC_VERSION__ >= 201112L && !defined(__STDC_NO_THREADS__)
 598       #define STBI_THREAD_LOCAL       _Thread_local
 599    #endif
 600
 601    #ifndef STBI_THREAD_LOCAL
 602       #if defined(__GNUC__)
 603         #define STBI_THREAD_LOCAL       __thread
 604       #endif
 605    #endif
 606 #endif
 607
 608 #ifdef _MSC_VER
 609 typedef unsigned short stbi__uint16;
 610 typedef   signed short stbi__int16;
 611 typedef unsigned int   stbi__uint32;
 612 typedef   signed int   stbi__int32;
 613 #else
 614 #include <stdint.h>
 615 typedef uint16_t stbi__uint16;
 616 typedef int16_t  stbi__int16;
 617 typedef uint32_t stbi__uint32;
 618 typedef int32_t  stbi__int32;
 619 #endif
 620
 621 // should produce compiler error if size is wrong
 622 typedef unsigned char validate_uint32[sizeof(stbi__uint32)==4 ? 1 : -1];
 623
 624 #ifdef _MSC_VER
 625 #define STBI_NOTUSED(v)  (void)(v)
 626 #else
 627 #define STBI_NOTUSED(v)  (void)sizeof(v)
 628 #endif
 629
 630 #ifdef _MSC_VER
 631 #define STBI_HAS_LROTL
 632 #endif
 633
 634 #ifdef STBI_HAS_LROTL
 635    #define stbi_lrot(x,y)  _lrotl(x,y)
 636 #else
 637    #define stbi_lrot(x,y)  (((x) << (y)) | ((x) >> (32 - (y))))
 638 #endif
 639
 640 #if defined(STBI_MALLOC) && defined(STBI_FREE) && (defined(STBI_REALLOC) || defined(STBI_REALLOC_SIZED))
 641 // ok
 642 #elif !defined(STBI_MALLOC) && !defined(STBI_FREE) && !defined(STBI_REALLOC) && !defined(STBI_REALLOC_SIZED)
 643 // ok
 644 #else
 645 #error "Must define all or none of STBI_MALLOC, STBI_FREE, and STBI_REALLOC (or STBI_REALLOC_SIZED)."
 646 #endif
 647
 648 #ifndef STBI_MALLOC
 649 #define STBI_MALLOC(sz)           malloc(sz)
 650 #define STBI_REALLOC(p,newsz)     realloc(p,newsz)
 651 #define STBI_FREE(p)              free(p)
 652 #endif
 653
 654 #ifndef STBI_REALLOC_SIZED
 655 #define STBI_REALLOC_SIZED(p,oldsz,newsz) STBI_REALLOC(p,newsz)
 656 #endif
 657
 658 // x86/x64 detection
 659 #if defined(__x86_64__) || defined(_M_X64)
 660 #define STBI__X64_TARGET
 661 #elif defined(__i386) || defined(_M_IX86)
 662 #define STBI__X86_TARGET
 663 #endif
 664
 665 #if defined(__GNUC__) && defined(STBI__X86_TARGET) && !defined(__SSE2__) && !defined(STBI_NO_SIMD)
 666 // gcc doesn't support sse2 intrinsics unless you compile with -msse2,
 667 // which in turn means it gets to use SSE2 everywhere. This is unfortunate,
 668 // but previous attempts to provide the SSE2 functions with runtime
 669 // detection caused numerous issues. The way architecture extensions are
 670 // exposed in GCC/Clang is, sadly, not really suited for one-file libs.
 671 // New behavior: if compiled with -msse2, we use SSE2 without any
 672 // detection; if not, we don't use it at all.
 673 #define STBI_NO_SIMD
 674 #endif
 675
 676 #if defined(__MINGW32__) && defined(STBI__X86_TARGET) && !defined(STBI_MINGW_ENABLE_SSE2) && !defined(STBI_NO_SIMD)
 677 // Note that __MINGW32__ doesn't actually mean 32-bit, so we have to avoid STBI__X64_TARGET
 678 //
 679 // 32-bit MinGW wants ESP to be 16-byte aligned, but this is not in the
 680 // Windows ABI and VC++ as well as Windows DLLs don't maintain that invariant.
 681 // As a result, enabling SSE2 on 32-bit MinGW is dangerous when not
 682 // simultaneously enabling "-mstackrealign".
 683 //
 684 // See https://github.com/nothings/stb/issues/81 for more information.
 685 //
 686 // So default to no SSE2 on 32-bit MinGW. If you've read this far and added
 687 // -mstackrealign to your build settings, feel free to #define STBI_MINGW_ENABLE_SSE2.
 688 #define STBI_NO_SIMD
 689 #endif
 690
 691 #if !defined(STBI_NO_SIMD) && (defined(STBI__X86_TARGET) || defined(STBI__X64_TARGET))
 692 #define STBI_SSE2
 693 #include <emmintrin.h>
 694
 695 #ifdef _MSC_VER
 696
 697 #if _MSC_VER >= 1400  // not VC6
 698 #include <intrin.h> // __cpuid
 699 static int stbi__cpuid3(void)
 700 {
 701    int info[4];
 702    __cpuid(info,1);
 703    return info[3];
 704 }
 705 #else
 706 static int stbi__cpuid3(void)
 707 {
 708    int res;
 709    __asm {
 710       mov  eax,1
 711       cpuid
 712       mov  res,edx
 713    }
 714    return res;
 715 }
 716 #endif
 717
 718 #define STBI_SIMD_ALIGN(type, name) __declspec(align(16)) type name
 719
 720 #if !defined(STBI_NO_JPEG) && defined(STBI_SSE2)
 721 static int stbi__sse2_available(void)
 722 {
 723    int info3 = stbi__cpuid3();
 724    return ((info3 >> 26) & 1) != 0;
 725 }
 726 #endif
 727
 728 #else // assume GCC-style if not VC++
 729 #define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
 730
 731 #if !defined(STBI_NO_JPEG) && defined(STBI_SSE2)
 732 static int stbi__sse2_available(void)
 733 {
 734    // If we're even attempting to compile this on GCC/Clang, that means
 735    // -msse2 is on, which means the compiler is allowed to use SSE2
 736    // instructions at will, and so are we.
 737    return 1;
 738 }
 739 #endif
 740
 741 #endif
 742 #endif
 743
 744 // ARM NEON
 745 #if defined(STBI_NO_SIMD) && defined(STBI_NEON)
 746 #undef STBI_NEON
 747 #endif
 748
 749 #ifdef STBI_NEON
 750 #include <arm_neon.h>
 751 // assume GCC or Clang on ARM targets
 752 #define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
 753 #endif
 754
 755 #ifndef STBI_SIMD_ALIGN
 756 #define STBI_SIMD_ALIGN(type, name) type name
 757 #endif
 758
 759 #ifndef STBI_MAX_DIMENSIONS
 760 #define STBI_MAX_DIMENSIONS (1 << 24)
 761 #endif
 762
 763 ///////////////////////////////////////////////
 764 //
 765 //  stbi__context struct and start_xxx functions
 766
 767 // stbi__context structure is our basic context used by all images, so it
 768 // contains all the IO context, plus some basic image information
 769 typedef struct
 770 {
 771    stbi__uint32 img_x, img_y;
 772    int img_n, img_out_n;
 773
 774    stbi_io_callbacks io;
 775    void *io_user_data;
 776
 777    int read_from_callbacks;
 778    int buflen;
 779    stbi_uc buffer_start[128];
 780    int callback_already_read;
 781
 782    stbi_uc *img_buffer, *img_buffer_end;
 783    stbi_uc *img_buffer_original, *img_buffer_original_end;
 784 } stbi__context;
 785
 786
 787 static void stbi__refill_buffer(stbi__context *s);
 788
 789 // initialize a memory-decode context
 790 static void stbi__start_mem(stbi__context *s, stbi_uc const *buffer, int len)
 791 {
 792    s->io.read = NULL;
 793    s->read_from_callbacks = 0;
 794    s->callback_already_read = 0;
 795    s->img_buffer = s->img_buffer_original = (stbi_uc *) buffer;
 796    s->img_buffer_end = s->img_buffer_original_end = (stbi_uc *) buffer+len;
 797 }
 798
 799 // initialize a callback-based context
 800 static void stbi__start_callbacks(stbi__context *s, stbi_io_callbacks *c, void *user)
 801 {
 802    s->io = *c;
 803    s->io_user_data = user;
 804    s->buflen = sizeof(s->buffer_start);
 805    s->read_from_callbacks = 1;
 806    s->callback_already_read = 0;
 807    s->img_buffer = s->img_buffer_original = s->buffer_start;
 808    stbi__refill_buffer(s);
 809    s->img_buffer_original_end = s->img_buffer_end;
 810 }
 811
 812 #ifndef STBI_NO_STDIO
 813
 814 static int stbi__stdio_read(void *user, char *data, int size)
 815 {
 816    return (int) fread(data,1,size,(FILE*) user);
 817 }
 818
 819 static void stbi__stdio_skip(void *user, int n)
 820 {
 821    int ch;
 822    fseek((FILE*) user, n, SEEK_CUR);
 823    ch = fgetc((FILE*) user);  /* have to read a byte to reset feof()'s flag */
 824    if (ch != EOF) {
 825       ungetc(ch, (FILE *) user);  /* push byte back onto stream if valid. */
 826    }
 827 }
 828
 829 static int stbi__stdio_eof(void *user)
 830 {
 831    return feof((FILE*) user) || ferror((FILE *) user);
 832 }
 833
 834 static stbi_io_callbacks stbi__stdio_callbacks =
 835 {
 836    stbi__stdio_read,
 837    stbi__stdio_skip,
 838    stbi__stdio_eof,
 839 };
 840
 841 static void stbi__start_file(stbi__context *s, FILE *f)
 842 {
 843    stbi__start_callbacks(s, &stbi__stdio_callbacks, (void *) f);
 844 }
 845
 846 //static void stop_file(stbi__context *s) { }
 847
 848 #endif // !STBI_NO_STDIO
 849
 850 static void stbi__rewind(stbi__context *s)
 851 {
 852    // conceptually rewind SHOULD rewind to the beginning of the stream,
 853    // but we just rewind to the beginning of the initial buffer, because
 854    // we only use it after doing 'test', which only ever looks at at most 92 bytes
 855    s->img_buffer = s->img_buffer_original;
 856    s->img_buffer_end = s->img_buffer_original_end;
 857 }
 858
 859 enum
 860 {
 861    STBI_ORDER_RGB,
 862    STBI_ORDER_BGR
 863 };
 864
 865 typedef struct
 866 {
 867    int bits_per_channel;
 868    int num_channels;
 869    int channel_order;
 870 } stbi__result_info;
 871
 872 #ifndef STBI_NO_JPEG
 873 static int      stbi__jpeg_test(stbi__context *s);
 874 static void    *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
 875 static int      stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp);
 876 #endif
 877
 878 #ifndef STBI_NO_PNG
 879 static int      stbi__png_test(stbi__context *s);
 880 static void    *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
 881 static int      stbi__png_info(stbi__context *s, int *x, int *y, int *comp);
 882 static int      stbi__png_is16(stbi__context *s);
 883 #endif
 884
 885 #ifndef STBI_NO_BMP
 886 static int      stbi__bmp_test(stbi__context *s);
 887 static void    *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
 888 static int      stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp);
 889 #endif
 890
 891 #ifndef STBI_NO_TGA
 892 static int      stbi__tga_test(stbi__context *s);
 893 static void    *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
 894 static int      stbi__tga_info(stbi__context *s, int *x, int *y, int *comp);
 895 #endif
 896
 897 #ifndef STBI_NO_PSD
 898 static int      stbi__psd_test(stbi__context *s);
 899 static void    *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri, int bpc);
 900 static int      stbi__psd_info(stbi__context *s, int *x, int *y, int *comp);
 901 static int      stbi__psd_is16(stbi__context *s);
 902 #endif
 903
 904 #ifndef STBI_NO_HDR
 905 static int      stbi__hdr_test(stbi__context *s);
 906 static float   *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
 907 static int      stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp);
 908 #endif
 909
 910 #ifndef STBI_NO_PIC
 911 static int      stbi__pic_test(stbi__context *s);
 912 static void    *stbi__pic_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
 913 static int      stbi__pic_info(stbi__context *s, int *x, int *y, int *comp);
 914 #endif
 915
 916 #ifndef STBI_NO_GIF
 917 static int      stbi__gif_test(stbi__context *s);
 918 static void    *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
 919 static void    *stbi__load_gif_main(stbi__context *s, int **delays, int *x, int *y, int *z, int *comp, int req_comp);
 920 static int      stbi__gif_info(stbi__context *s, int *x, int *y, int *comp);
 921 #endif
 922
 923 #ifndef STBI_NO_PNM
 924 static int      stbi__pnm_test(stbi__context *s);
 925 static void    *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
 926 static int      stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp);
 927 #endif
 928
 929 static
 930 #ifdef STBI_THREAD_LOCAL
 931 STBI_THREAD_LOCAL
 932 #endif
 933 const char *stbi__g_failure_reason;
 934
 935 STBIDEF const char *stbi_failure_reason(void)
 936 {
 937    return stbi__g_failure_reason;
 938 }
 939
 940 #ifndef STBI_NO_FAILURE_STRINGS
 941 static int stbi__err(const char *str)
 942 {
 943    stbi__g_failure_reason = str;
 944    return 0;
 945 }
 946 #endif
 947
 948 static void *stbi__malloc(size_t size)
 949 {
 950     return STBI_MALLOC(size);
 951 }
 952
 953 // stb_image uses ints pervasively, including for offset calculations.
 954 // therefore the largest decoded image size we can support with the
 955 // current code, even on 64-bit targets, is INT_MAX. this is not a
 956 // significant limitation for the intended use case.
 957 //
 958 // we do, however, need to make sure our size calculations don't
 959 // overflow. hence a few helper functions for size calculations that
 960 // multiply integers together, making sure that they're non-negative
 961 // and no overflow occurs.
 962
 963 // return 1 if the sum is valid, 0 on overflow.
 964 // negative terms are considered invalid.
 965 static int stbi__addsizes_valid(int a, int b)
 966 {
 967    if (b < 0) return 0;
 968    // now 0 <= b <= INT_MAX, hence also
 969    // 0 <= INT_MAX - b <= INTMAX.
 970    // And "a + b <= INT_MAX" (which might overflow) is the
 971    // same as a <= INT_MAX - b (no overflow)
 972    return a <= INT_MAX - b;
 973 }
 974
 975 // returns 1 if the product is valid, 0 on overflow.
 976 // negative factors are considered invalid.
 977 static int stbi__mul2sizes_valid(int a, int b)
 978 {
 979    if (a < 0 || b < 0) return 0;
 980    if (b == 0) return 1; // mul-by-0 is always safe
 981    // portable way to check for no overflows in a*b
 982    return a <= INT_MAX/b;
 983 }
 984
 985 #if !defined(STBI_NO_JPEG) || !defined(STBI_NO_PNG) || !defined(STBI_NO_TGA) || !defined(STBI_NO_HDR)
 986 // returns 1 if "a*b + add" has no negative terms/factors and doesn't overflow
 987 static int stbi__mad2sizes_valid(int a, int b, int add)
 988 {
 989    return stbi__mul2sizes_valid(a, b) && stbi__addsizes_valid(a*b, add);
 990 }
 991 #endif
 992
 993 // returns 1 if "a*b*c + add" has no negative terms/factors and doesn't overflow
 994 static int stbi__mad3sizes_valid(int a, int b, int c, int add)
 995 {
 996    return stbi__mul2sizes_valid(a, b) && stbi__mul2sizes_valid(a*b, c) &&
 997       stbi__addsizes_valid(a*b*c, add);
 998 }
 999
1000 // returns 1 if "a*b*c*d + add" has no negative terms/factors and doesn't overflow
1001 #if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR)
1002 static int stbi__mad4sizes_valid(int a, int b, int c, int d, int add)
1003 {
1004    return stbi__mul2sizes_valid(a, b) && stbi__mul2sizes_valid(a*b, c) &&
1005       stbi__mul2sizes_valid(a*b*c, d) && stbi__addsizes_valid(a*b*c*d, add);
1006 }
1007 #endif
1008
1009 #if !defined(STBI_NO_JPEG) || !defined(STBI_NO_PNG) || !defined(STBI_NO_TGA) || !defined(STBI_NO_HDR)
1010 // mallocs with size overflow checking
1011 static void *stbi__malloc_mad2(int a, int b, int add)
1012 {
1013    if (!stbi__mad2sizes_valid(a, b, add)) return NULL;
1014    return stbi__malloc(a*b + add);
1015 }
1016 #endif
1017
1018 static void *stbi__malloc_mad3(int a, int b, int c, int add)
1019 {
1020    if (!stbi__mad3sizes_valid(a, b, c, add)) return NULL;
1021    return stbi__malloc(a*b*c + add);
1022 }
1023
1024 #if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR)
1025 static void *stbi__malloc_mad4(int a, int b, int c, int d, int add)
1026 {
1027    if (!stbi__mad4sizes_valid(a, b, c, d, add)) return NULL;
1028    return stbi__malloc(a*b*c*d + add);
1029 }
1030 #endif
1031
1032 // stbi__err - error
1033 // stbi__errpf - error returning pointer to float
1034 // stbi__errpuc - error returning pointer to unsigned char
1035
1036 #ifdef STBI_NO_FAILURE_STRINGS
1037    #define stbi__err(x,y)  0
1038 #elif defined(STBI_FAILURE_USERMSG)
1039    #define stbi__err(x,y)  stbi__err(y)
1040 #else
1041    #define stbi__err(x,y)  stbi__err(x)
1042 #endif
1043
1044 #define stbi__errpf(x,y)   ((float *)(size_t) (stbi__err(x,y)?NULL:NULL))
1045 #define stbi__errpuc(x,y)  ((unsigned char *)(size_t) (stbi__err(x,y)?NULL:NULL))
1046
1047 STBIDEF void stbi_image_free(void *retval_from_stbi_load)
1048 {
1049    STBI_FREE(retval_from_stbi_load);
1050 }
1051
1052 #ifndef STBI_NO_LINEAR
1053 static float   *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp);
1054 #endif
1055
1056 #ifndef STBI_NO_HDR
1057 static stbi_uc *stbi__hdr_to_ldr(float   *data, int x, int y, int comp);
1058 #endif
1059
1060 static int stbi__vertically_flip_on_load_global = 0;
1061
1062 STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip)
1063 {
1064    stbi__vertically_flip_on_load_global = flag_true_if_should_flip;
1065 }
1066
1067 #ifndef STBI_THREAD_LOCAL
1068 #define stbi__vertically_flip_on_load  stbi__vertically_flip_on_load_global
1069 #else
1070 static STBI_THREAD_LOCAL int stbi__vertically_flip_on_load_local, stbi__vertically_flip_on_load_set;
1071
1072 STBIDEF void stbi_set_flip_vertically_on_load_thread(int flag_true_if_should_flip)
1073 {
1074    stbi__vertically_flip_on_load_local = flag_true_if_should_flip;
1075    stbi__vertically_flip_on_load_set = 1;
1076 }
1077
1078 #define stbi__vertically_flip_on_load  (stbi__vertically_flip_on_load_set       \
1079                                          ? stbi__vertically_flip_on_load_local  \
1080                                          : stbi__vertically_flip_on_load_global)
1081 #endif // STBI_THREAD_LOCAL
1082
1083 static void *stbi__load_main(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri, int bpc)
1084 {
1085    memset(ri, 0, sizeof(*ri)); // make sure it's initialized if we add new fields
1086    ri->bits_per_channel = 8; // default is 8 so most paths don't have to be changed
1087    ri->channel_order = STBI_ORDER_RGB; // all current input & output are this, but this is here so we can add BGR order
1088    ri->num_channels = 0;
1089
1090    #ifndef STBI_NO_JPEG
1091    if (stbi__jpeg_test(s)) return stbi__jpeg_load(s,x,y,comp,req_comp, ri);
1092    #endif
1093    #ifndef STBI_NO_PNG
1094    if (stbi__png_test(s))  return stbi__png_load(s,x,y,comp,req_comp, ri);
1095    #endif
1096    #ifndef STBI_NO_BMP
1097    if (stbi__bmp_test(s))  return stbi__bmp_load(s,x,y,comp,req_comp, ri);
1098    #endif
1099    #ifndef STBI_NO_GIF
1100    if (stbi__gif_test(s))  return stbi__gif_load(s,x,y,comp,req_comp, ri);
1101    #endif
1102    #ifndef STBI_NO_PSD
1103    if (stbi__psd_test(s))  return stbi__psd_load(s,x,y,comp,req_comp, ri, bpc);
1104    #else
1105    STBI_NOTUSED(bpc);
1106    #endif
1107    #ifndef STBI_NO_PIC
1108    if (stbi__pic_test(s))  return stbi__pic_load(s,x,y,comp,req_comp, ri);
1109    #endif
1110    #ifndef STBI_NO_PNM
1111    if (stbi__pnm_test(s))  return stbi__pnm_load(s,x,y,comp,req_comp, ri);
1112    #endif
1113
1114    #ifndef STBI_NO_HDR
1115    if (stbi__hdr_test(s)) {
1116       float *hdr = stbi__hdr_load(s, x,y,comp,req_comp, ri);
1117       return stbi__hdr_to_ldr(hdr, *x, *y, req_comp ? req_comp : *comp);
1118    }
1119    #endif
1120
1121    #ifndef STBI_NO_TGA
1122    // test tga last because it's a crappy test!
1123    if (stbi__tga_test(s))
1124       return stbi__tga_load(s,x,y,comp,req_comp, ri);
1125    #endif
1126
1127    return stbi__errpuc("unknown image type", "Image not of any known type, or corrupt");
1128 }
1129
1130 static stbi_uc *stbi__convert_16_to_8(stbi__uint16 *orig, int w, int h, int channels)
1131 {
1132    int i;
1133    int img_len = w * h * channels;
1134    stbi_uc *reduced;
1135
1136    reduced = (stbi_uc *) stbi__malloc(img_len);
1137    if (reduced == NULL) return stbi__errpuc("outofmem", "Out of memory");
1138
1139    for (i = 0; i < img_len; ++i)
1140       reduced[i] = (stbi_uc)((orig[i] >> 8) & 0xFF); // top half of each byte is sufficient approx of 16->8 bit scaling
1141
1142    STBI_FREE(orig);
1143    return reduced;
1144 }
1145
1146 static stbi__uint16 *stbi__convert_8_to_16(stbi_uc *orig, int w, int h, int channels)
1147 {
1148    int i;
1149    int img_len = w * h * channels;
1150    stbi__uint16 *enlarged;
1151
1152    enlarged = (stbi__uint16 *) stbi__malloc(img_len*2);
1153    if (enlarged == NULL) return (stbi__uint16 *) stbi__errpuc("outofmem", "Out of memory");
1154
1155    for (i = 0; i < img_len; ++i)
1156       enlarged[i] = (stbi__uint16)((orig[i] << 8) + orig[i]); // replicate to high and low byte, maps 0->0, 255->0xffff
1157
1158    STBI_FREE(orig);
1159    return enlarged;
1160 }
1161
1162 static void stbi__vertical_flip(void *image, int w, int h, int bytes_per_pixel)
1163 {
1164    int row;
1165    size_t bytes_per_row = (size_t)w * bytes_per_pixel;
1166    stbi_uc temp[2048];
1167    stbi_uc *bytes = (stbi_uc *)image;
1168
1169    for (row = 0; row < (h>>1); row++) {
1170       stbi_uc *row0 = bytes + row*bytes_per_row;
1171       stbi_uc *row1 = bytes + (h - row - 1)*bytes_per_row;
1172       // swap row0 with row1
1173       size_t bytes_left = bytes_per_row;
1174       while (bytes_left) {
1175          size_t bytes_copy = (bytes_left < sizeof(temp)) ? bytes_left : sizeof(temp);
1176          memcpy(temp, row0, bytes_copy);
1177          memcpy(row0, row1, bytes_copy);
1178          memcpy(row1, temp, bytes_copy);
1179          row0 += bytes_copy;
1180          row1 += bytes_copy;
1181          bytes_left -= bytes_copy;
1182       }
1183    }
1184 }
1185
1186 #ifndef STBI_NO_GIF
1187 static void stbi__vertical_flip_slices(void *image, int w, int h, int z, int bytes_per_pixel)
1188 {
1189    int slice;
1190    int slice_size = w * h * bytes_per_pixel;
1191
1192    stbi_uc *bytes = (stbi_uc *)image;
1193    for (slice = 0; slice < z; ++slice) {
1194       stbi__vertical_flip(bytes, w, h, bytes_per_pixel);
1195       bytes += slice_size;
1196    }
1197 }
1198 #endif
1199
1200 static unsigned char *stbi__load_and_postprocess_8bit(stbi__context *s, int *x, int *y, int *comp, int req_comp)
1201 {
1202    stbi__result_info ri;
1203    void *result = stbi__load_main(s, x, y, comp, req_comp, &ri, 8);
1204
1205    if (result == NULL)
1206       return NULL;
1207
1208    // it is the responsibility of the loaders to make sure we get either 8 or 16 bit.
1209    STBI_ASSERT(ri.bits_per_channel == 8 || ri.bits_per_channel == 16);
1210
1211    if (ri.bits_per_channel != 8) {
1212       result = stbi__convert_16_to_8((stbi__uint16 *) result, *x, *y, req_comp == 0 ? *comp : req_comp);
1213       ri.bits_per_channel = 8;
1214    }
1215
1216    // @TODO: move stbi__convert_format to here
1217
1218    if (stbi__vertically_flip_on_load) {
1219       int channels = req_comp ? req_comp : *comp;
1220       stbi__vertical_flip(result, *x, *y, channels * sizeof(stbi_uc));
1221    }
1222
1223    return (unsigned char *) result;
1224 }
1225
1226 static stbi__uint16 *stbi__load_and_postprocess_16bit(stbi__context *s, int *x, int *y, int *comp, int req_comp)
1227 {
1228    stbi__result_info ri;
1229    void *result = stbi__load_main(s, x, y, comp, req_comp, &ri, 16);
1230
1231    if (result == NULL)
1232       return NULL;
1233
1234    // it is the responsibility of the loaders to make sure we get either 8 or 16 bit.
1235    STBI_ASSERT(ri.bits_per_channel == 8 || ri.bits_per_channel == 16);
1236
1237    if (ri.bits_per_channel != 16) {
1238       result = stbi__convert_8_to_16((stbi_uc *) result, *x, *y, req_comp == 0 ? *comp : req_comp);
1239       ri.bits_per_channel = 16;
1240    }
1241
1242    // @TODO: move stbi__convert_format16 to here
1243    // @TODO: special case RGB-to-Y (and RGBA-to-YA) for 8-bit-to-16-bit case to keep more precision
1244
1245    if (stbi__vertically_flip_on_load) {
1246       int channels = req_comp ? req_comp : *comp;
1247       stbi__vertical_flip(result, *x, *y, channels * sizeof(stbi__uint16));
1248    }
1249
1250    return (stbi__uint16 *) result;
1251 }
1252
1253 #if !defined(STBI_NO_HDR) && !defined(STBI_NO_LINEAR)
1254 static void stbi__float_postprocess(float *result, int *x, int *y, int *comp, int req_comp)
1255 {
1256    if (stbi__vertically_flip_on_load && result != NULL) {
1257       int channels = req_comp ? req_comp : *comp;
1258       stbi__vertical_flip(result, *x, *y, channels * sizeof(float));
1259    }
1260 }
1261 #endif
1262
1263 #ifndef STBI_NO_STDIO
1264
1265 #if defined(_MSC_VER) && defined(STBI_WINDOWS_UTF8)
1266 STBI_EXTERN __declspec(dllimport) int __stdcall MultiByteToWideChar(unsigned int cp, unsigned long flags, const char *str, int cbmb, wchar_t *widestr, int cchwide);
1267 STBI_EXTERN __declspec(dllimport) int __stdcall WideCharToMultiByte(unsigned int cp, unsigned long flags, const wchar_t *widestr, int cchwide, char *str, int cbmb, const char *defchar, int *used_default);
1268 #endif
1269
1270 #if defined(_MSC_VER) && defined(STBI_WINDOWS_UTF8)
1271 STBIDEF int stbi_convert_wchar_to_utf8(char *buffer, size_t bufferlen, const wchar_t* input)
1272 {
1273         return WideCharToMultiByte(65001 /* UTF8 */, 0, input, -1, buffer, (int) bufferlen, NULL, NULL);
1274 }
1275 #endif
1276
1277 static FILE *stbi__fopen(char const *filename, char const *mode)
1278 {
1279    FILE *f;
1280 #if defined(_MSC_VER) && defined(STBI_WINDOWS_UTF8)
1281    wchar_t wMode[64];
1282    wchar_t wFilename[1024];
1283         if (0 == MultiByteToWideChar(65001 /* UTF8 */, 0, filename, -1, wFilename, sizeof(wFilename)))
1284       return 0;
1285
1286         if (0 == MultiByteToWideChar(65001 /* UTF8 */, 0, mode, -1, wMode, sizeof(wMode)))
1287       return 0;
1288
1289 #if _MSC_VER >= 1400
1290         if (0 != _wfopen_s(&f, wFilename, wMode))
1291                 f = 0;
1292 #else
1293    f = _wfopen(wFilename, wMode);
1294 #endif
1295
1296 #elif defined(_MSC_VER) && _MSC_VER >= 1400
1297    if (0 != fopen_s(&f, filename, mode))
1298       f=0;
1299 #else
1300    f = fopen(filename, mode);
1301 #endif
1302    return f;
1303 }
1304
1305
1306 STBIDEF stbi_uc *stbi_load(char const *filename, int *x, int *y, int *comp, int req_comp)
1307 {
1308    FILE *f = stbi__fopen(filename, "rb");
1309    unsigned char *result;
1310    if (!f) return stbi__errpuc("can't fopen", "Unable to open file");
1311    result = stbi_load_from_file(f,x,y,comp,req_comp);
1312    fclose(f);
1313    return result;
1314 }
1315
1316 STBIDEF stbi_uc *stbi_load_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
1317 {
1318    unsigned char *result;
1319    stbi__context s;
1320    stbi__start_file(&s,f);
1321    result = stbi__load_and_postprocess_8bit(&s,x,y,comp,req_comp);
1322    if (result) {
1323       // need to 'unget' all the characters in the IO buffer
1324       fseek(f, - (int) (s.img_buffer_end - s.img_buffer), SEEK_CUR);
1325    }
1326    return result;
1327 }
1328
1329 STBIDEF stbi__uint16 *stbi_load_from_file_16(FILE *f, int *x, int *y, int *comp, int req_comp)
1330 {
1331    stbi__uint16 *result;
1332    stbi__context s;
1333    stbi__start_file(&s,f);
1334    result = stbi__load_and_postprocess_16bit(&s,x,y,comp,req_comp);
1335    if (result) {
1336       // need to 'unget' all the characters in the IO buffer
1337       fseek(f, - (int) (s.img_buffer_end - s.img_buffer), SEEK_CUR);
1338    }
1339    return result;
1340 }
1341
1342 STBIDEF stbi_us *stbi_load_16(char const *filename, int *x, int *y, int *comp, int req_comp)
1343 {
1344    FILE *f = stbi__fopen(filename, "rb");
1345    stbi__uint16 *result;
1346    if (!f) return (stbi_us *) stbi__errpuc("can't fopen", "Unable to open file");
1347    result = stbi_load_from_file_16(f,x,y,comp,req_comp);
1348    fclose(f);
1349    return result;
1350 }
1351
1352
1353 #endif //!STBI_NO_STDIO
1354
1355 STBIDEF stbi_us *stbi_load_16_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *channels_in_file, int desired_channels)
1356 {
1357    stbi__context s;
1358    stbi__start_mem(&s,buffer,len);
1359    return stbi__load_and_postprocess_16bit(&s,x,y,channels_in_file,desired_channels);
1360 }
1361
1362 STBIDEF stbi_us *stbi_load_16_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *channels_in_file, int desired_channels)
1363 {
1364    stbi__context s;
1365    stbi__start_callbacks(&s, (stbi_io_callbacks *)clbk, user);
1366    return stbi__load_and_postprocess_16bit(&s,x,y,channels_in_file,desired_channels);
1367 }
1368
1369 STBIDEF stbi_uc *stbi_load_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
1370 {
1371    stbi__context s;
1372    stbi__start_mem(&s,buffer,len);
1373    return stbi__load_and_postprocess_8bit(&s,x,y,comp,req_comp);
1374 }
1375
1376 STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
1377 {
1378    stbi__context s;
1379    stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
1380    return stbi__load_and_postprocess_8bit(&s,x,y,comp,req_comp);
1381 }
1382
1383 #ifndef STBI_NO_GIF
1384 STBIDEF stbi_uc *stbi_load_gif_from_memory(stbi_uc const *buffer, int len, int **delays, int *x, int *y, int *z, int *comp, int req_comp)
1385 {
1386    unsigned char *result;
1387    stbi__context s;
1388    stbi__start_mem(&s,buffer,len);
1389
1390    result = (unsigned char*) stbi__load_gif_main(&s, delays, x, y, z, comp, req_comp);
1391    if (stbi__vertically_flip_on_load) {
1392       stbi__vertical_flip_slices( result, *x, *y, *z, *comp );
1393    }
1394
1395    return result;
1396 }
1397 #endif
1398
1399 #ifndef STBI_NO_LINEAR
1400 static float *stbi__loadf_main(stbi__context *s, int *x, int *y, int *comp, int req_comp)
1401 {
1402    unsigned char *data;
1403    #ifndef STBI_NO_HDR
1404    if (stbi__hdr_test(s)) {
1405       stbi__result_info ri;
1406       float *hdr_data = stbi__hdr_load(s,x,y,comp,req_comp, &ri);
1407       if (hdr_data)
1408          stbi__float_postprocess(hdr_data,x,y,comp,req_comp);
1409       return hdr_data;
1410    }
1411    #endif
1412    data = stbi__load_and_postprocess_8bit(s, x, y, comp, req_comp);
1413    if (data)
1414       return stbi__ldr_to_hdr(data, *x, *y, req_comp ? req_comp : *comp);
1415    return stbi__errpf("unknown image type", "Image not of any known type, or corrupt");
1416 }
1417
1418 STBIDEF float *stbi_loadf_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
1419 {
1420    stbi__context s;
1421    stbi__start_mem(&s,buffer,len);
1422    return stbi__loadf_main(&s,x,y,comp,req_comp);
1423 }
1424
1425 STBIDEF float *stbi_loadf_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
1426 {
1427    stbi__context s;
1428    stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
1429    return stbi__loadf_main(&s,x,y,comp,req_comp);
1430 }
1431
1432 #ifndef STBI_NO_STDIO
1433 STBIDEF float *stbi_loadf(char const *filename, int *x, int *y, int *comp, int req_comp)
1434 {
1435    float *result;
1436    FILE *f = stbi__fopen(filename, "rb");
1437    if (!f) return stbi__errpf("can't fopen", "Unable to open file");
1438    result = stbi_loadf_from_file(f,x,y,comp,req_comp);
1439    fclose(f);
1440    return result;
1441 }
1442
1443 STBIDEF float *stbi_loadf_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
1444 {
1445    stbi__context s;
1446    stbi__start_file(&s,f);
1447    return stbi__loadf_main(&s,x,y,comp,req_comp);
1448 }
1449 #endif // !STBI_NO_STDIO
1450
1451 #endif // !STBI_NO_LINEAR
1452
1453 // these is-hdr-or-not is defined independent of whether STBI_NO_LINEAR is
1454 // defined, for API simplicity; if STBI_NO_LINEAR is defined, it always
1455 // reports false!
1456
1457 STBIDEF int stbi_is_hdr_from_memory(stbi_uc const *buffer, int len)
1458 {
1459    #ifndef STBI_NO_HDR
1460    stbi__context s;
1461    stbi__start_mem(&s,buffer,len);
1462    return stbi__hdr_test(&s);
1463    #else
1464    STBI_NOTUSED(buffer);
1465    STBI_NOTUSED(len);
1466    return 0;
1467    #endif
1468 }
1469
1470 #ifndef STBI_NO_STDIO
1471 STBIDEF int      stbi_is_hdr          (char const *filename)
1472 {
1473    FILE *f = stbi__fopen(filename, "rb");
1474    int result=0;
1475    if (f) {
1476       result = stbi_is_hdr_from_file(f);
1477       fclose(f);
1478    }
1479    return result;
1480 }
1481
1482 STBIDEF int stbi_is_hdr_from_file(FILE *f)
1483 {
1484    #ifndef STBI_NO_HDR
1485    long pos = ftell(f);
1486    int res;
1487    stbi__context s;
1488    stbi__start_file(&s,f);
1489    res = stbi__hdr_test(&s);
1490    fseek(f, pos, SEEK_SET);
1491    return res;
1492    #else
1493    STBI_NOTUSED(f);
1494    return 0;
1495    #endif
1496 }
1497 #endif // !STBI_NO_STDIO
1498
1499 STBIDEF int      stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user)
1500 {
1501    #ifndef STBI_NO_HDR
1502    stbi__context s;
1503    stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
1504    return stbi__hdr_test(&s);
1505    #else
1506    STBI_NOTUSED(clbk);
1507    STBI_NOTUSED(user);
1508    return 0;
1509    #endif
1510 }
1511
1512 #ifndef STBI_NO_LINEAR
1513 static float stbi__l2h_gamma=2.2f, stbi__l2h_scale=1.0f;
1514
1515 STBIDEF void   stbi_ldr_to_hdr_gamma(float gamma) { stbi__l2h_gamma = gamma; }
1516 STBIDEF void   stbi_ldr_to_hdr_scale(float scale) { stbi__l2h_scale = scale; }
1517 #endif
1518
1519 static float stbi__h2l_gamma_i=1.0f/2.2f, stbi__h2l_scale_i=1.0f;
1520
1521 STBIDEF void   stbi_hdr_to_ldr_gamma(float gamma) { stbi__h2l_gamma_i = 1/gamma; }
1522 STBIDEF void   stbi_hdr_to_ldr_scale(float scale) { stbi__h2l_scale_i = 1/scale; }
1523
1524
1525 //////////////////////////////////////////////////////////////////////////////
1526 //
1527 // Common code used by all image loaders
1528 //
1529
1530 enum
1531 {
1532    STBI__SCAN_load=0,
1533    STBI__SCAN_type,
1534    STBI__SCAN_header
1535 };
1536
1537 static void stbi__refill_buffer(stbi__context *s)
1538 {
1539    int n = (s->io.read)(s->io_user_data,(char*)s->buffer_start,s->buflen);
1540    s->callback_already_read += (int) (s->img_buffer - s->img_buffer_original);
1541    if (n == 0) {
1542       // at end of file, treat same as if from memory, but need to handle case
1543       // where s->img_buffer isn't pointing to safe memory, e.g. 0-byte file
1544       s->read_from_callbacks = 0;
1545       s->img_buffer = s->buffer_start;
1546       s->img_buffer_end = s->buffer_start+1;
1547       *s->img_buffer = 0;
1548    } else {
1549       s->img_buffer = s->buffer_start;
1550       s->img_buffer_end = s->buffer_start + n;
1551    }
1552 }
1553
1554 stbi_inline static stbi_uc stbi__get8(stbi__context *s)
1555 {
1556    if (s->img_buffer < s->img_buffer_end)
1557       return *s->img_buffer++;
1558    if (s->read_from_callbacks) {
1559       stbi__refill_buffer(s);
1560       return *s->img_buffer++;
1561    }
1562    return 0;
1563 }
1564
1565 #if defined(STBI_NO_JPEG) && defined(STBI_NO_HDR) && defined(STBI_NO_PIC) && defined(STBI_NO_PNM)
1566 // nothing
1567 #else
1568 stbi_inline static int stbi__at_eof(stbi__context *s)
1569 {
1570    if (s->io.read) {
1571       if (!(s->io.eof)(s->io_user_data)) return 0;
1572       // if feof() is true, check if buffer = end
1573       // special case: we've only got the special 0 character at the end
1574       if (s->read_from_callbacks == 0) return 1;
1575    }
1576
1577    return s->img_buffer >= s->img_buffer_end;
1578 }
1579 #endif
1580
1581 #if defined(STBI_NO_JPEG) && defined(STBI_NO_PNG) && defined(STBI_NO_BMP) && defined(STBI_NO_PSD) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF) && defined(STBI_NO_PIC)
1582 // nothing
1583 #else
1584 static void stbi__skip(stbi__context *s, int n)
1585 {
1586    if (n == 0) return;  // already there!
1587    if (n < 0) {
1588       s->img_buffer = s->img_buffer_end;
1589       return;
1590    }
1591    if (s->io.read) {
1592       int blen = (int) (s->img_buffer_end - s->img_buffer);
1593       if (blen < n) {
1594          s->img_buffer = s->img_buffer_end;
1595          (s->io.skip)(s->io_user_data, n - blen);
1596          return;
1597       }
1598    }
1599    s->img_buffer += n;
1600 }
1601 #endif
1602
1603 #if defined(STBI_NO_PNG) && defined(STBI_NO_TGA) && defined(STBI_NO_HDR) && defined(STBI_NO_PNM)
1604 // nothing
1605 #else
1606 static int stbi__getn(stbi__context *s, stbi_uc *buffer, int n)
1607 {
1608    if (s->io.read) {
1609       int blen = (int) (s->img_buffer_end - s->img_buffer);
1610       if (blen < n) {
1611          int res, count;
1612
1613          memcpy(buffer, s->img_buffer, blen);
1614
1615          count = (s->io.read)(s->io_user_data, (char*) buffer + blen, n - blen);
1616          res = (count == (n-blen));
1617          s->img_buffer = s->img_buffer_end;
1618          return res;
1619       }
1620    }
1621
1622    if (s->img_buffer+n <= s->img_buffer_end) {
1623       memcpy(buffer, s->img_buffer, n);
1624       s->img_buffer += n;
1625       return 1;
1626    } else
1627       return 0;
1628 }
1629 #endif
1630
1631 #if defined(STBI_NO_JPEG) && defined(STBI_NO_PNG) && defined(STBI_NO_PSD) && defined(STBI_NO_PIC)
1632 // nothing
1633 #else
1634 static int stbi__get16be(stbi__context *s)
1635 {
1636    int z = stbi__get8(s);
1637    return (z << 8) + stbi__get8(s);
1638 }
1639 #endif
1640
1641 #if defined(STBI_NO_PNG) && defined(STBI_NO_PSD) && defined(STBI_NO_PIC)
1642 // nothing
1643 #else
1644 static stbi__uint32 stbi__get32be(stbi__context *s)
1645 {
1646    stbi__uint32 z = stbi__get16be(s);
1647    return (z << 16) + stbi__get16be(s);
1648 }
1649 #endif
1650
1651 #if defined(STBI_NO_BMP) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF)
1652 // nothing
1653 #else
1654 static int stbi__get16le(stbi__context *s)
1655 {
1656    int z = stbi__get8(s);
1657    return z + (stbi__get8(s) << 8);
1658 }
1659 #endif
1660
1661 #ifndef STBI_NO_BMP
1662 static stbi__uint32 stbi__get32le(stbi__context *s)
1663 {
1664    stbi__uint32 z = stbi__get16le(s);
1665    return z + (stbi__get16le(s) << 16);
1666 }
1667 #endif
1668
1669 #define STBI__BYTECAST(x)  ((stbi_uc) ((x) & 255))  // truncate int to byte without warnings
1670
1671 #if defined(STBI_NO_JPEG) && defined(STBI_NO_PNG) && defined(STBI_NO_BMP) && defined(STBI_NO_PSD) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF) && defined(STBI_NO_PIC) && defined(STBI_NO_PNM)
1672 // nothing
1673 #else
1674 //////////////////////////////////////////////////////////////////////////////
1675 //
1676 //  generic converter from built-in img_n to req_comp
1677 //    individual types do this automatically as much as possible (e.g. jpeg
1678 //    does all cases internally since it needs to colorspace convert anyway,
1679 //    and it never has alpha, so very few cases ). png can automatically
1680 //    interleave an alpha=255 channel, but falls back to this for other cases
1681 //
1682 //  assume data buffer is malloced, so malloc a new one and free that one
1683 //  only failure mode is malloc failing
1684
1685 static stbi_uc stbi__compute_y(int r, int g, int b)
1686 {
1687    return (stbi_uc) (((r*77) + (g*150) +  (29*b)) >> 8);
1688 }
1689 #endif
1690
1691 #if defined(STBI_NO_PNG) && defined(STBI_NO_BMP) && defined(STBI_NO_PSD) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF) && defined(STBI_NO_PIC) && defined(STBI_NO_PNM)
1692 // nothing
1693 #else
1694 static unsigned char *stbi__convert_format(unsigned char *data, int img_n, int req_comp, unsigned int x, unsigned int y)
1695 {
1696    int i,j;
1697    unsigned char *good;
1698
1699    if (req_comp == img_n) return data;
1700    STBI_ASSERT(req_comp >= 1 && req_comp <= 4);
1701
1702    good = (unsigned char *) stbi__malloc_mad3(req_comp, x, y, 0);
1703    if (good == NULL) {
1704       STBI_FREE(data);
1705       return stbi__errpuc("outofmem", "Out of memory");
1706    }
1707
1708    for (j=0; j < (int) y; ++j) {
1709       unsigned char *src  = data + j * x * img_n   ;
1710       unsigned char *dest = good + j * x * req_comp;
1711
1712       #define STBI__COMBO(a,b)  ((a)*8+(b))
1713       #define STBI__CASE(a,b)   case STBI__COMBO(a,b): for(i=x-1; i >= 0; --i, src += a, dest += b)
1714       // convert source image with img_n components to one with req_comp components;
1715       // avoid switch per pixel, so use switch per scanline and massive macros
1716       switch (STBI__COMBO(img_n, req_comp)) {
1717          STBI__CASE(1,2) { dest[0]=src[0]; dest[1]=255;                                     } break;
1718          STBI__CASE(1,3) { dest[0]=dest[1]=dest[2]=src[0];                                  } break;
1719          STBI__CASE(1,4) { dest[0]=dest[1]=dest[2]=src[0]; dest[3]=255;                     } break;
1720          STBI__CASE(2,1) { dest[0]=src[0];                                                  } break;
1721          STBI__CASE(2,3) { dest[0]=dest[1]=dest[2]=src[0];                                  } break;
1722          STBI__CASE(2,4) { dest[0]=dest[1]=dest[2]=src[0]; dest[3]=src[1];                  } break;
1723          STBI__CASE(3,4) { dest[0]=src[0];dest[1]=src[1];dest[2]=src[2];dest[3]=255;        } break;
1724          STBI__CASE(3,1) { dest[0]=stbi__compute_y(src[0],src[1],src[2]);                   } break;
1725          STBI__CASE(3,2) { dest[0]=stbi__compute_y(src[0],src[1],src[2]); dest[1] = 255;    } break;
1726          STBI__CASE(4,1) { dest[0]=stbi__compute_y(src[0],src[1],src[2]);                   } break;
1727          STBI__CASE(4,2) { dest[0]=stbi__compute_y(src[0],src[1],src[2]); dest[1] = src[3]; } break;
1728          STBI__CASE(4,3) { dest[0]=src[0];dest[1]=src[1];dest[2]=src[2];                    } break;
1729          default: STBI_ASSERT(0); STBI_FREE(data); STBI_FREE(good); return stbi__errpuc("unsupported", "Unsupported format conversion");
1730       }
1731       #undef STBI__CASE
1732    }
1733
1734    STBI_FREE(data);
1735    return good;
1736 }
1737 #endif
1738
1739 #if defined(STBI_NO_PNG) && defined(STBI_NO_PSD)
1740 // nothing
1741 #else
1742 static stbi__uint16 stbi__compute_y_16(int r, int g, int b)
1743 {
1744    return (stbi__uint16) (((r*77) + (g*150) +  (29*b)) >> 8);
1745 }
1746 #endif
1747
1748 #if defined(STBI_NO_PNG) && defined(STBI_NO_PSD)
1749 // nothing
1750 #else
1751 static stbi__uint16 *stbi__convert_format16(stbi__uint16 *data, int img_n, int req_comp, unsigned int x, unsigned int y)
1752 {
1753    int i,j;
1754    stbi__uint16 *good;
1755
1756    if (req_comp == img_n) return data;
1757    STBI_ASSERT(req_comp >= 1 && req_comp <= 4);
1758
1759    good = (stbi__uint16 *) stbi__malloc(req_comp * x * y * 2);
1760    if (good == NULL) {
1761       STBI_FREE(data);
1762       return (stbi__uint16 *) stbi__errpuc("outofmem", "Out of memory");
1763    }
1764
1765    for (j=0; j < (int) y; ++j) {
1766       stbi__uint16 *src  = data + j * x * img_n   ;
1767       stbi__uint16 *dest = good + j * x * req_comp;
1768
1769       #define STBI__COMBO(a,b)  ((a)*8+(b))
1770       #define STBI__CASE(a,b)   case STBI__COMBO(a,b): for(i=x-1; i >= 0; --i, src += a, dest += b)
1771       // convert source image with img_n components to one with req_comp components;
1772       // avoid switch per pixel, so use switch per scanline and massive macros
1773       switch (STBI__COMBO(img_n, req_comp)) {
1774          STBI__CASE(1,2) { dest[0]=src[0]; dest[1]=0xffff;                                     } break;
1775          STBI__CASE(1,3) { dest[0]=dest[1]=dest[2]=src[0];                                     } break;
1776          STBI__CASE(1,4) { dest[0]=dest[1]=dest[2]=src[0]; dest[3]=0xffff;                     } break;
1777          STBI__CASE(2,1) { dest[0]=src[0];                                                     } break;
1778          STBI__CASE(2,3) { dest[0]=dest[1]=dest[2]=src[0];                                     } break;
1779          STBI__CASE(2,4) { dest[0]=dest[1]=dest[2]=src[0]; dest[3]=src[1];                     } break;
1780          STBI__CASE(3,4) { dest[0]=src[0];dest[1]=src[1];dest[2]=src[2];dest[3]=0xffff;        } break;
1781          STBI__CASE(3,1) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]);                   } break;
1782          STBI__CASE(3,2) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]); dest[1] = 0xffff; } break;
1783          STBI__CASE(4,1) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]);                   } break;
1784          STBI__CASE(4,2) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]); dest[1] = src[3]; } break;
1785          STBI__CASE(4,3) { dest[0]=src[0];dest[1]=src[1];dest[2]=src[2];                       } break;
1786          default: STBI_ASSERT(0); STBI_FREE(data); STBI_FREE(good); return (stbi__uint16*) stbi__errpuc("unsupported", "Unsupported format conversion");
1787       }
1788       #undef STBI__CASE
1789    }
1790
1791    STBI_FREE(data);
1792    return good;
1793 }
1794 #endif
1795
1796 #ifndef STBI_NO_LINEAR
1797 static float   *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp)
1798 {
1799    int i,k,n;
1800    float *output;
1801    if (!data) return NULL;
1802    output = (float *) stbi__malloc_mad4(x, y, comp, sizeof(float), 0);
1803    if (output == NULL) { STBI_FREE(data); return stbi__errpf("outofmem", "Out of memory"); }
1804    // compute number of non-alpha components
1805    if (comp & 1) n = comp; else n = comp-1;
1806    for (i=0; i < x*y; ++i) {
1807       for (k=0; k < n; ++k) {
1808          output[i*comp + k] = (float) (pow(data[i*comp+k]/255.0f, stbi__l2h_gamma) * stbi__l2h_scale);
1809       }
1810    }
1811    if (n < comp) {
1812       for (i=0; i < x*y; ++i) {
1813          output[i*comp + n] = data[i*comp + n]/255.0f;
1814       }
1815    }
1816    STBI_FREE(data);
1817    return output;
1818 }
1819 #endif
1820
1821 #ifndef STBI_NO_HDR
1822 #define stbi__float2int(x)   ((int) (x))
1823 static stbi_uc *stbi__hdr_to_ldr(float   *data, int x, int y, int comp)
1824 {
1825    int i,k,n;
1826    stbi_uc *output;
1827    if (!data) return NULL;
1828    output = (stbi_uc *) stbi__malloc_mad3(x, y, comp, 0);
1829    if (output == NULL) { STBI_FREE(data); return stbi__errpuc("outofmem", "Out of memory"); }
1830    // compute number of non-alpha components
1831    if (comp & 1) n = comp; else n = comp-1;
1832    for (i=0; i < x*y; ++i) {
1833       for (k=0; k < n; ++k) {
1834          float z = (float) pow(data[i*comp+k]*stbi__h2l_scale_i, stbi__h2l_gamma_i) * 255 + 0.5f;
1835          if (z < 0) z = 0;
1836          if (z > 255) z = 255;
1837          output[i*comp + k] = (stbi_uc) stbi__float2int(z);
1838       }
1839       if (k < comp) {
1840          float z = data[i*comp+k] * 255 + 0.5f;
1841          if (z < 0) z = 0;
1842          if (z > 255) z = 255;
1843          output[i*comp + k] = (stbi_uc) stbi__float2int(z);
1844       }
1845    }
1846    STBI_FREE(data);
1847    return output;
1848 }
1849 #endif
1850
1851 //////////////////////////////////////////////////////////////////////////////
1852 //
1853 //  "baseline" JPEG/JFIF decoder
1854 //
1855 //    simple implementation
1856 //      - doesn't support delayed output of y-dimension
1857 //      - simple interface (only one output format: 8-bit interleaved RGB)
1858 //      - doesn't try to recover corrupt jpegs
1859 //      - doesn't allow partial loading, loading multiple at once
1860 //      - still fast on x86 (copying globals into locals doesn't help x86)
1861 //      - allocates lots of intermediate memory (full size of all components)
1862 //        - non-interleaved case requires this anyway
1863 //        - allows good upsampling (see next)
1864 //    high-quality
1865 //      - upsampled channels are bilinearly interpolated, even across blocks
1866 //      - quality integer IDCT derived from IJG's 'slow'
1867 //    performance
1868 //      - fast huffman; reasonable integer IDCT
1869 //      - some SIMD kernels for common paths on targets with SSE2/NEON
1870 //      - uses a lot of intermediate memory, could cache poorly
1871
1872 #ifndef STBI_NO_JPEG
1873
1874 // huffman decoding acceleration
1875 #define FAST_BITS   9  // larger handles more cases; smaller stomps less cache
1876
1877 typedef struct
1878 {
1879    stbi_uc  fast[1 << FAST_BITS];
1880    // weirdly, repacking this into AoS is a 10% speed loss, instead of a win
1881    stbi__uint16 code[256];
1882    stbi_uc  values[256];
1883    stbi_uc  size[257];
1884    unsigned int maxcode[18];
1885    int    delta[17];   // old 'firstsymbol' - old 'firstcode'
1886 } stbi__huffman;
1887
1888 typedef struct
1889 {
1890    stbi__context *s;
1891    stbi__huffman huff_dc[4];
1892    stbi__huffman huff_ac[4];
1893    stbi__uint16 dequant[4][64];
1894    stbi__int16 fast_ac[4][1 << FAST_BITS];
1895
1896 // sizes for components, interleaved MCUs
1897    int img_h_max, img_v_max;
1898    int img_mcu_x, img_mcu_y;
1899    int img_mcu_w, img_mcu_h;
1900
1901 // definition of jpeg image component
1902    struct
1903    {
1904       int id;
1905       int h,v;
1906       int tq;
1907       int hd,ha;
1908       int dc_pred;
1909
1910       int x,y,w2,h2;
1911       stbi_uc *data;
1912       void *raw_data, *raw_coeff;
1913       stbi_uc *linebuf;
1914       short   *coeff;   // progressive only
1915       int      coeff_w, coeff_h; // number of 8x8 coefficient blocks
1916    } img_comp[4];
1917
1918    stbi__uint32   code_buffer; // jpeg entropy-coded buffer
1919    int            code_bits;   // number of valid bits
1920    unsigned char  marker;      // marker seen while filling entropy buffer
1921    int            nomore;      // flag if we saw a marker so must stop
1922
1923    int            progressive;
1924    int            spec_start;
1925    int            spec_end;
1926    int            succ_high;
1927    int            succ_low;
1928    int            eob_run;
1929    int            jfif;
1930    int            app14_color_transform; // Adobe APP14 tag
1931    int            rgb;
1932
1933    int scan_n, order[4];
1934    int restart_interval, todo;
1935
1936 // kernels
1937    void (*idct_block_kernel)(stbi_uc *out, int out_stride, short data[64]);
1938    void (*YCbCr_to_RGB_kernel)(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step);
1939    stbi_uc *(*resample_row_hv_2_kernel)(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs);
1940 } stbi__jpeg;
1941
1942 static int stbi__build_huffman(stbi__huffman *h, int *count)
1943 {
1944    int i,j,k=0;
1945    unsigned int code;
1946    // build size list for each symbol (from JPEG spec)
1947    for (i=0; i < 16; ++i)
1948       for (j=0; j < count[i]; ++j)
1949          h->size[k++] = (stbi_uc) (i+1);
1950    h->size[k] = 0;
1951
1952    // compute actual symbols (from jpeg spec)
1953    code = 0;
1954    k = 0;
1955    for(j=1; j <= 16; ++j) {
1956       // compute delta to add to code to compute symbol id
1957       h->delta[j] = k - code;
1958       if (h->size[k] == j) {
1959          while (h->size[k] == j)
1960             h->code[k++] = (stbi__uint16) (code++);
1961          if (code-1 >= (1u << j)) return stbi__err("bad code lengths","Corrupt JPEG");
1962       }
1963       // compute largest code + 1 for this size, preshifted as needed later
1964       h->maxcode[j] = code << (16-j);
1965       code <<= 1;
1966    }
1967    h->maxcode[j] = 0xffffffff;
1968
1969    // build non-spec acceleration table; 255 is flag for not-accelerated
1970    memset(h->fast, 255, 1 << FAST_BITS);
1971    for (i=0; i < k; ++i) {
1972       int s = h->size[i];
1973       if (s <= FAST_BITS) {
1974          int c = h->code[i] << (FAST_BITS-s);
1975          int m = 1 << (FAST_BITS-s);
1976          for (j=0; j < m; ++j) {
1977             h->fast[c+j] = (stbi_uc) i;
1978          }
1979       }
1980    }
1981    return 1;
1982 }
1983
1984 // build a table that decodes both magnitude and value of small ACs in
1985 // one go.
1986 static void stbi__build_fast_ac(stbi__int16 *fast_ac, stbi__huffman *h)
1987 {
1988    int i;
1989    for (i=0; i < (1 << FAST_BITS); ++i) {
1990       stbi_uc fast = h->fast[i];
1991       fast_ac[i] = 0;
1992       if (fast < 255) {
1993          int rs = h->values[fast];
1994          int run = (rs >> 4) & 15;
1995          int magbits = rs & 15;
1996          int len = h->size[fast];
1997
1998          if (magbits && len + magbits <= FAST_BITS) {
1999             // magnitude code followed by receive_extend code
2000             int k = ((i << len) & ((1 << FAST_BITS) - 1)) >> (FAST_BITS - magbits);
2001             int m = 1 << (magbits - 1);
2002             if (k < m) k += (~0U << magbits) + 1;
2003             // if the result is small enough, we can fit it in fast_ac table
2004             if (k >= -128 && k <= 127)
2005                fast_ac[i] = (stbi__int16) ((k * 256) + (run * 16) + (len + magbits));
2006          }
2007       }
2008    }
2009 }
2010
2011 static void stbi__grow_buffer_unsafe(stbi__jpeg *j)
2012 {
2013    do {
2014       unsigned int b = j->nomore ? 0 : stbi__get8(j->s);
2015       if (b == 0xff) {
2016          int c = stbi__get8(j->s);
2017          while (c == 0xff) c = stbi__get8(j->s); // consume fill bytes
2018          if (c != 0) {
2019             j->marker = (unsigned char) c;
2020             j->nomore = 1;
2021             return;
2022          }
2023       }
2024       j->code_buffer |= b << (24 - j->code_bits);
2025       j->code_bits += 8;
2026    } while (j->code_bits <= 24);
2027 }
2028
2029 // (1 << n) - 1
2030 static const stbi__uint32 stbi__bmask[17]={0,1,3,7,15,31,63,127,255,511,1023,2047,4095,8191,16383,32767,65535};
2031
2032 // decode a jpeg huffman value from the bitstream
2033 stbi_inline static int stbi__jpeg_huff_decode(stbi__jpeg *j, stbi__huffman *h)
2034 {
2035    unsigned int temp;
2036    int c,k;
2037
2038    if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
2039
2040    // look at the top FAST_BITS and determine what symbol ID it is,
2041    // if the code is <= FAST_BITS
2042    c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
2043    k = h->fast[c];
2044    if (k < 255) {
2045       int s = h->size[k];
2046       if (s > j->code_bits)
2047          return -1;
2048       j->code_buffer <<= s;
2049       j->code_bits -= s;
2050       return h->values[k];
2051    }
2052
2053    // naive test is to shift the code_buffer down so k bits are
2054    // valid, then test against maxcode. To speed this up, we've
2055    // preshifted maxcode left so that it has (16-k) 0s at the
2056    // end; in other words, regardless of the number of bits, it
2057    // wants to be compared against something shifted to have 16;
2058    // that way we don't need to shift inside the loop.
2059    temp = j->code_buffer >> 16;
2060    for (k=FAST_BITS+1 ; ; ++k)
2061       if (temp < h->maxcode[k])
2062          break;
2063    if (k == 17) {
2064       // error! code not found
2065       j->code_bits -= 16;
2066       return -1;
2067    }
2068
2069    if (k > j->code_bits)
2070       return -1;
2071
2072    // convert the huffman code to the symbol id
2073    c = ((j->code_buffer >> (32 - k)) & stbi__bmask[k]) + h->delta[k];
2074    STBI_ASSERT((((j->code_buffer) >> (32 - h->size[c])) & stbi__bmask[h->size[c]]) == h->code[c]);
2075
2076    // convert the id to a symbol
2077    j->code_bits -= k;
2078    j->code_buffer <<= k;
2079    return h->values[c];
2080 }
2081
2082 // bias[n] = (-1<<n) + 1
2083 static const int stbi__jbias[16] = {0,-1,-3,-7,-15,-31,-63,-127,-255,-511,-1023,-2047,-4095,-8191,-16383,-32767};
2084
2085 // combined JPEG 'receive' and JPEG 'extend', since baseline
2086 // always extends everything it receives.
2087 stbi_inline static int stbi__extend_receive(stbi__jpeg *j, int n)
2088 {
2089    unsigned int k;
2090    int sgn;
2091    if (j->code_bits < n) stbi__grow_buffer_unsafe(j);
2092
2093    sgn = (stbi__int32)j->code_buffer >> 31; // sign bit is always in MSB
2094    k = stbi_lrot(j->code_buffer, n);
2095    if (n < 0 || n >= (int) (sizeof(stbi__bmask)/sizeof(*stbi__bmask))) return 0;
2096    j->code_buffer = k & ~stbi__bmask[n];
2097    k &= stbi__bmask[n];
2098    j->code_bits -= n;
2099    return k + (stbi__jbias[n] & ~sgn);
2100 }
2101
2102 // get some unsigned bits
2103 stbi_inline static int stbi__jpeg_get_bits(stbi__jpeg *j, int n)
2104 {
2105    unsigned int k;
2106    if (j->code_bits < n) stbi__grow_buffer_unsafe(j);
2107    k = stbi_lrot(j->code_buffer, n);
2108    j->code_buffer = k & ~stbi__bmask[n];
2109    k &= stbi__bmask[n];
2110    j->code_bits -= n;
2111    return k;
2112 }
2113
2114 stbi_inline static int stbi__jpeg_get_bit(stbi__jpeg *j)
2115 {
2116    unsigned int k;
2117    if (j->code_bits < 1) stbi__grow_buffer_unsafe(j);
2118    k = j->code_buffer;
2119    j->code_buffer <<= 1;
2120    --j->code_bits;
2121    return k & 0x80000000;
2122 }
2123
2124 // given a value that's at position X in the zigzag stream,
2125 // where does it appear in the 8x8 matrix coded as row-major?
2126 static const stbi_uc stbi__jpeg_dezigzag[64+15] =
2127 {
2128     0,  1,  8, 16,  9,  2,  3, 10,
2129    17, 24, 32, 25, 18, 11,  4,  5,
2130    12, 19, 26, 33, 40, 48, 41, 34,
2131    27, 20, 13,  6,  7, 14, 21, 28,
2132    35, 42, 49, 56, 57, 50, 43, 36,
2133    29, 22, 15, 23, 30, 37, 44, 51,
2134    58, 59, 52, 45, 38, 31, 39, 46,
2135    53, 60, 61, 54, 47, 55, 62, 63,
2136    // let corrupt input sample past end
2137    63, 63, 63, 63, 63, 63, 63, 63,
2138    63, 63, 63, 63, 63, 63, 63
2139 };
2140
2141 // decode one 64-entry block--
2142 static int stbi__jpeg_decode_block(stbi__jpeg *j, short data[64], stbi__huffman *hdc, stbi__huffman *hac, stbi__int16 *fac, int b, stbi__uint16 *dequant)
2143 {
2144    int diff,dc,k;
2145    int t;
2146
2147    if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
2148    t = stbi__jpeg_huff_decode(j, hdc);
2149    if (t < 0) return stbi__err("bad huffman code","Corrupt JPEG");
2150
2151    // 0 all the ac values now so we can do it 32-bits at a time
2152    memset(data,0,64*sizeof(data[0]));
2153
2154    diff = t ? stbi__extend_receive(j, t) : 0;
2155    dc = j->img_comp[b].dc_pred + diff;
2156    j->img_comp[b].dc_pred = dc;
2157    data[0] = (short) (dc * dequant[0]);
2158
2159    // decode AC components, see JPEG spec
2160    k = 1;
2161    do {
2162       unsigned int zig;
2163       int c,r,s;
2164       if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
2165       c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
2166       r = fac[c];
2167       if (r) { // fast-AC path
2168          k += (r >> 4) & 15; // run
2169          s = r & 15; // combined length
2170          j->code_buffer <<= s;
2171          j->code_bits -= s;
2172          // decode into unzigzag'd location
2173          zig = stbi__jpeg_dezigzag[k++];
2174          data[zig] = (short) ((r >> 8) * dequant[zig]);
2175       } else {
2176          int rs = stbi__jpeg_huff_decode(j, hac);
2177          if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
2178          s = rs & 15;
2179          r = rs >> 4;
2180          if (s == 0) {
2181             if (rs != 0xf0) break; // end block
2182             k += 16;
2183          } else {
2184             k += r;
2185             // decode into unzigzag'd location
2186             zig = stbi__jpeg_dezigzag[k++];
2187             data[zig] = (short) (stbi__extend_receive(j,s) * dequant[zig]);
2188          }
2189       }
2190    } while (k < 64);
2191    return 1;
2192 }
2193
2194 static int stbi__jpeg_decode_block_prog_dc(stbi__jpeg *j, short data[64], stbi__huffman *hdc, int b)
2195 {
2196    int diff,dc;
2197    int t;
2198    if (j->spec_end != 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
2199
2200    if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
2201
2202    if (j->succ_high == 0) {
2203       // first scan for DC coefficient, must be first
2204       memset(data,0,64*sizeof(data[0])); // 0 all the ac values now
2205       t = stbi__jpeg_huff_decode(j, hdc);
2206       if (t == -1) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
2207       diff = t ? stbi__extend_receive(j, t) : 0;
2208
2209       dc = j->img_comp[b].dc_pred + diff;
2210       j->img_comp[b].dc_pred = dc;
2211       data[0] = (short) (dc << j->succ_low);
2212    } else {
2213       // refinement scan for DC coefficient
2214       if (stbi__jpeg_get_bit(j))
2215          data[0] += (short) (1 << j->succ_low);
2216    }
2217    return 1;
2218 }
2219
2220 // @OPTIMIZE: store non-zigzagged during the decode passes,
2221 // and only de-zigzag when dequantizing
2222 static int stbi__jpeg_decode_block_prog_ac(stbi__jpeg *j, short data[64], stbi__huffman *hac, stbi__int16 *fac)
2223 {
2224    int k;
2225    if (j->spec_start == 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
2226
2227    if (j->succ_high == 0) {
2228       int shift = j->succ_low;
2229
2230       if (j->eob_run) {
2231          --j->eob_run;
2232          return 1;
2233       }
2234
2235       k = j->spec_start;
2236       do {
2237          unsigned int zig;
2238          int c,r,s;
2239          if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
2240          c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
2241          r = fac[c];
2242          if (r) { // fast-AC path
2243             k += (r >> 4) & 15; // run
2244             s = r & 15; // combined length
2245             j->code_buffer <<= s;
2246             j->code_bits -= s;
2247             zig = stbi__jpeg_dezigzag[k++];
2248             data[zig] = (short) ((r >> 8) << shift);
2249          } else {
2250             int rs = stbi__jpeg_huff_decode(j, hac);
2251             if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
2252             s = rs & 15;
2253             r = rs >> 4;
2254             if (s == 0) {
2255                if (r < 15) {
2256                   j->eob_run = (1 << r);
2257                   if (r)
2258                      j->eob_run += stbi__jpeg_get_bits(j, r);
2259                   --j->eob_run;
2260                   break;
2261                }
2262                k += 16;
2263             } else {
2264                k += r;
2265                zig = stbi__jpeg_dezigzag[k++];
2266                data[zig] = (short) (stbi__extend_receive(j,s) << shift);
2267             }
2268          }
2269       } while (k <= j->spec_end);
2270    } else {
2271       // refinement scan for these AC coefficients
2272
2273       short bit = (short) (1 << j->succ_low);
2274
2275       if (j->eob_run) {
2276          --j->eob_run;
2277          for (k = j->spec_start; k <= j->spec_end; ++k) {
2278             short *p = &data[stbi__jpeg_dezigzag[k]];
2279             if (*p != 0)
2280                if (stbi__jpeg_get_bit(j))
2281                   if ((*p & bit)==0) {
2282                      if (*p > 0)
2283                         *p += bit;
2284                      else
2285                         *p -= bit;
2286                   }
2287          }
2288       } else {
2289          k = j->spec_start;
2290          do {
2291             int r,s;
2292             int rs = stbi__jpeg_huff_decode(j, hac); // @OPTIMIZE see if we can use the fast path here, advance-by-r is so slow, eh
2293             if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
2294             s = rs & 15;
2295             r = rs >> 4;
2296             if (s == 0) {
2297                if (r < 15) {
2298                   j->eob_run = (1 << r) - 1;
2299                   if (r)
2300                      j->eob_run += stbi__jpeg_get_bits(j, r);
2301                   r = 64; // force end of block
2302                } else {
2303                   // r=15 s=0 should write 16 0s, so we just do
2304                   // a run of 15 0s and then write s (which is 0),
2305                   // so we don't have to do anything special here
2306                }
2307             } else {
2308                if (s != 1) return stbi__err("bad huffman code", "Corrupt JPEG");
2309                // sign bit
2310                if (stbi__jpeg_get_bit(j))
2311                   s = bit;
2312                else
2313                   s = -bit;
2314             }
2315
2316             // advance by r
2317             while (k <= j->spec_end) {
2318                short *p = &data[stbi__jpeg_dezigzag[k++]];
2319                if (*p != 0) {
2320                   if (stbi__jpeg_get_bit(j))
2321                      if ((*p & bit)==0) {
2322                         if (*p > 0)
2323                            *p += bit;
2324                         else
2325                            *p -= bit;
2326                      }
2327                } else {
2328                   if (r == 0) {
2329                      *p = (short) s;
2330                      break;
2331                   }
2332                   --r;
2333                }
2334             }
2335          } while (k <= j->spec_end);
2336       }
2337    }
2338    return 1;
2339 }
2340
2341 // take a -128..127 value and stbi__clamp it and convert to 0..255
2342 stbi_inline static stbi_uc stbi__clamp(int x)
2343 {
2344    // trick to use a single test to catch both cases
2345    if ((unsigned int) x > 255) {
2346       if (x < 0) return 0;
2347       if (x > 255) return 255;
2348    }
2349    return (stbi_uc) x;
2350 }
2351
2352 #define stbi__f2f(x)  ((int) (((x) * 4096 + 0.5)))
2353 #define stbi__fsh(x)  ((x) * 4096)
2354
2355 // derived from jidctint -- DCT_ISLOW
2356 #define STBI__IDCT_1D(s0,s1,s2,s3,s4,s5,s6,s7) \
2357    int t0,t1,t2,t3,p1,p2,p3,p4,p5,x0,x1,x2,x3; \
2358    p2 = s2;                                    \
2359    p3 = s6;                                    \
2360    p1 = (p2+p3) * stbi__f2f(0.5411961f);       \
2361    t2 = p1 + p3*stbi__f2f(-1.847759065f);      \
2362    t3 = p1 + p2*stbi__f2f( 0.765366865f);      \
2363    p2 = s0;                                    \
2364    p3 = s4;                                    \
2365    t0 = stbi__fsh(p2+p3);                      \
2366    t1 = stbi__fsh(p2-p3);                      \
2367    x0 = t0+t3;                                 \
2368    x3 = t0-t3;                                 \
2369    x1 = t1+t2;                                 \
2370    x2 = t1-t2;                                 \
2371    t0 = s7;                                    \
2372    t1 = s5;                                    \
2373    t2 = s3;                                    \
2374    t3 = s1;                                    \
2375    p3 = t0+t2;                                 \
2376    p4 = t1+t3;                                 \
2377    p1 = t0+t3;                                 \
2378    p2 = t1+t2;                                 \
2379    p5 = (p3+p4)*stbi__f2f( 1.175875602f);      \
2380    t0 = t0*stbi__f2f( 0.298631336f);           \
2381    t1 = t1*stbi__f2f( 2.053119869f);           \
2382    t2 = t2*stbi__f2f( 3.072711026f);           \
2383    t3 = t3*stbi__f2f( 1.501321110f);           \
2384    p1 = p5 + p1*stbi__f2f(-0.899976223f);      \
2385    p2 = p5 + p2*stbi__f2f(-2.562915447f);      \
2386    p3 = p3*stbi__f2f(-1.961570560f);           \
2387    p4 = p4*stbi__f2f(-0.390180644f);           \
2388    t3 += p1+p4;                                \
2389    t2 += p2+p3;                                \
2390    t1 += p2+p4;                                \
2391    t0 += p1+p3;
2392
2393 static void stbi__idct_block(stbi_uc *out, int out_stride, short data[64])
2394 {
2395    int i,val[64],*v=val;
2396    stbi_uc *o;
2397    short *d = data;
2398
2399    // columns
2400    for (i=0; i < 8; ++i,++d, ++v) {
2401       // if all zeroes, shortcut -- this avoids dequantizing 0s and IDCTing
2402       if (d[ 8]==0 && d[16]==0 && d[24]==0 && d[32]==0
2403            && d[40]==0 && d[48]==0 && d[56]==0) {
2404          //    no shortcut                 0     seconds
2405          //    (1|2|3|4|5|6|7)==0          0     seconds
2406          //    all separate               -0.047 seconds
2407          //    1 && 2|3 && 4|5 && 6|7:    -0.047 seconds
2408          int dcterm = d[0]*4;
2409          v[0] = v[8] = v[16] = v[24] = v[32] = v[40] = v[48] = v[56] = dcterm;
2410       } else {
2411          STBI__IDCT_1D(d[ 0],d[ 8],d[16],d[24],d[32],d[40],d[48],d[56])
2412          // constants scaled things up by 1<<12; let's bring them back
2413          // down, but keep 2 extra bits of precision
2414          x0 += 512; x1 += 512; x2 += 512; x3 += 512;
2415          v[ 0] = (x0+t3) >> 10;
2416          v[56] = (x0-t3) >> 10;
2417          v[ 8] = (x1+t2) >> 10;
2418          v[48] = (x1-t2) >> 10;
2419          v[16] = (x2+t1) >> 10;
2420          v[40] = (x2-t1) >> 10;
2421          v[24] = (x3+t0) >> 10;
2422          v[32] = (x3-t0) >> 10;
2423       }
2424    }
2425
2426    for (i=0, v=val, o=out; i < 8; ++i,v+=8,o+=out_stride) {
2427       // no fast case since the first 1D IDCT spread components out
2428       STBI__IDCT_1D(v[0],v[1],v[2],v[3],v[4],v[5],v[6],v[7])
2429       // constants scaled things up by 1<<12, plus we had 1<<2 from first
2430       // loop, plus horizontal and vertical each scale by sqrt(8) so together
2431       // we've got an extra 1<<3, so 1<<17 total we need to remove.
2432       // so we want to round that, which means adding 0.5 * 1<<17,
2433       // aka 65536. Also, we'll end up with -128 to 127 that we want
2434       // to encode as 0..255 by adding 128, so we'll add that before the shift
2435       x0 += 65536 + (128<<17);
2436       x1 += 65536 + (128<<17);
2437       x2 += 65536 + (128<<17);
2438       x3 += 65536 + (128<<17);
2439       // tried computing the shifts into temps, or'ing the temps to see
2440       // if any were out of range, but that was slower
2441       o[0] = stbi__clamp((x0+t3) >> 17);
2442       o[7] = stbi__clamp((x0-t3) >> 17);
2443       o[1] = stbi__clamp((x1+t2) >> 17);
2444       o[6] = stbi__clamp((x1-t2) >> 17);
2445       o[2] = stbi__clamp((x2+t1) >> 17);
2446       o[5] = stbi__clamp((x2-t1) >> 17);
2447       o[3] = stbi__clamp((x3+t0) >> 17);
2448       o[4] = stbi__clamp((x3-t0) >> 17);
2449    }
2450 }
2451
2452 #ifdef STBI_SSE2
2453 // sse2 integer IDCT. not the fastest possible implementation but it
2454 // produces bit-identical results to the generic C version so it's
2455 // fully "transparent".
2456 static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
2457 {
2458    // This is constructed to match our regular (generic) integer IDCT exactly.
2459    __m128i row0, row1, row2, row3, row4, row5, row6, row7;
2460    __m128i tmp;
2461
2462    // dot product constant: even elems=x, odd elems=y
2463    #define dct_const(x,y)  _mm_setr_epi16((x),(y),(x),(y),(x),(y),(x),(y))
2464
2465    // out(0) = c0[even]*x + c0[odd]*y   (c0, x, y 16-bit, out 32-bit)
2466    // out(1) = c1[even]*x + c1[odd]*y
2467    #define dct_rot(out0,out1, x,y,c0,c1) \
2468       __m128i c0##lo = _mm_unpacklo_epi16((x),(y)); \
2469       __m128i c0##hi = _mm_unpackhi_epi16((x),(y)); \
2470       __m128i out0##_l = _mm_madd_epi16(c0##lo, c0); \
2471       __m128i out0##_h = _mm_madd_epi16(c0##hi, c0); \
2472       __m128i out1##_l = _mm_madd_epi16(c0##lo, c1); \
2473       __m128i out1##_h = _mm_madd_epi16(c0##hi, c1)
2474
2475    // out = in << 12  (in 16-bit, out 32-bit)
2476    #define dct_widen(out, in) \
2477       __m128i out##_l = _mm_srai_epi32(_mm_unpacklo_epi16(_mm_setzero_si128(), (in)), 4); \
2478       __m128i out##_h = _mm_srai_epi32(_mm_unpackhi_epi16(_mm_setzero_si128(), (in)), 4)
2479
2480    // wide add
2481    #define dct_wadd(out, a, b) \
2482       __m128i out##_l = _mm_add_epi32(a##_l, b##_l); \
2483       __m128i out##_h = _mm_add_epi32(a##_h, b##_h)
2484
2485    // wide sub
2486    #define dct_wsub(out, a, b) \
2487       __m128i out##_l = _mm_sub_epi32(a##_l, b##_l); \
2488       __m128i out##_h = _mm_sub_epi32(a##_h, b##_h)
2489
2490    // butterfly a/b, add bias, then shift by "s" and pack
2491    #define dct_bfly32o(out0, out1, a,b,bias,s) \
2492       { \
2493          __m128i abiased_l = _mm_add_epi32(a##_l, bias); \
2494          __m128i abiased_h = _mm_add_epi32(a##_h, bias); \
2495          dct_wadd(sum, abiased, b); \
2496          dct_wsub(dif, abiased, b); \
2497          out0 = _mm_packs_epi32(_mm_srai_epi32(sum_l, s), _mm_srai_epi32(sum_h, s)); \
2498          out1 = _mm_packs_epi32(_mm_srai_epi32(dif_l, s), _mm_srai_epi32(dif_h, s)); \
2499       }
2500
2501    // 8-bit interleave step (for transposes)
2502    #define dct_interleave8(a, b) \
2503       tmp = a; \
2504       a = _mm_unpacklo_epi8(a, b); \
2505       b = _mm_unpackhi_epi8(tmp, b)
2506
2507    // 16-bit interleave step (for transposes)
2508    #define dct_interleave16(a, b) \
2509       tmp = a; \
2510       a = _mm_unpacklo_epi16(a, b); \
2511       b = _mm_unpackhi_epi16(tmp, b)
2512
2513    #define dct_pass(bias,shift) \
2514       { \
2515          /* even part */ \
2516          dct_rot(t2e,t3e, row2,row6, rot0_0,rot0_1); \
2517          __m128i sum04 = _mm_add_epi16(row0, row4); \
2518          __m128i dif04 = _mm_sub_epi16(row0, row4); \
2519          dct_widen(t0e, sum04); \
2520          dct_widen(t1e, dif04); \
2521          dct_wadd(x0, t0e, t3e); \
2522          dct_wsub(x3, t0e, t3e); \
2523          dct_wadd(x1, t1e, t2e); \
2524          dct_wsub(x2, t1e, t2e); \
2525          /* odd part */ \
2526          dct_rot(y0o,y2o, row7,row3, rot2_0,rot2_1); \
2527          dct_rot(y1o,y3o, row5,row1, rot3_0,rot3_1); \
2528          __m128i sum17 = _mm_add_epi16(row1, row7); \
2529          __m128i sum35 = _mm_add_epi16(row3, row5); \
2530          dct_rot(y4o,y5o, sum17,sum35, rot1_0,rot1_1); \
2531          dct_wadd(x4, y0o, y4o); \
2532          dct_wadd(x5, y1o, y5o); \
2533          dct_wadd(x6, y2o, y5o); \
2534          dct_wadd(x7, y3o, y4o); \
2535          dct_bfly32o(row0,row7, x0,x7,bias,shift); \
2536          dct_bfly32o(row1,row6, x1,x6,bias,shift); \
2537          dct_bfly32o(row2,row5, x2,x5,bias,shift); \
2538          dct_bfly32o(row3,row4, x3,x4,bias,shift); \
2539       }
2540
2541    __m128i rot0_0 = dct_const(stbi__f2f(0.5411961f), stbi__f2f(0.5411961f) + stbi__f2f(-1.847759065f));
2542    __m128i rot0_1 = dct_const(stbi__f2f(0.5411961f) + stbi__f2f( 0.765366865f), stbi__f2f(0.5411961f));
2543    __m128i rot1_0 = dct_const(stbi__f2f(1.175875602f) + stbi__f2f(-0.899976223f), stbi__f2f(1.175875602f));
2544    __m128i rot1_1 = dct_const(stbi__f2f(1.175875602f), stbi__f2f(1.175875602f) + stbi__f2f(-2.562915447f));
2545    __m128i rot2_0 = dct_const(stbi__f2f(-1.961570560f) + stbi__f2f( 0.298631336f), stbi__f2f(-1.961570560f));
2546    __m128i rot2_1 = dct_const(stbi__f2f(-1.961570560f), stbi__f2f(-1.961570560f) + stbi__f2f( 3.072711026f));
2547    __m128i rot3_0 = dct_const(stbi__f2f(-0.390180644f) + stbi__f2f( 2.053119869f), stbi__f2f(-0.390180644f));
2548    __m128i rot3_1 = dct_const(stbi__f2f(-0.390180644f), stbi__f2f(-0.390180644f) + stbi__f2f( 1.501321110f));
2549
2550    // rounding biases in column/row passes, see stbi__idct_block for explanation.
2551    __m128i bias_0 = _mm_set1_epi32(512);
2552    __m128i bias_1 = _mm_set1_epi32(65536 + (128<<17));
2553
2554    // load
2555    row0 = _mm_load_si128((const __m128i *) (data + 0*8));
2556    row1 = _mm_load_si128((const __m128i *) (data + 1*8));
2557    row2 = _mm_load_si128((const __m128i *) (data + 2*8));
2558    row3 = _mm_load_si128((const __m128i *) (data + 3*8));
2559    row4 = _mm_load_si128((const __m128i *) (data + 4*8));
2560    row5 = _mm_load_si128((const __m128i *) (data + 5*8));
2561    row6 = _mm_load_si128((const __m128i *) (data + 6*8));
2562    row7 = _mm_load_si128((const __m128i *) (data + 7*8));
2563
2564    // column pass
2565    dct_pass(bias_0, 10);
2566
2567    {
2568       // 16bit 8x8 transpose pass 1
2569       dct_interleave16(row0, row4);
2570       dct_interleave16(row1, row5);
2571       dct_interleave16(row2, row6);
2572       dct_interleave16(row3, row7);
2573
2574       // transpose pass 2
2575       dct_interleave16(row0, row2);
2576       dct_interleave16(row1, row3);
2577       dct_interleave16(row4, row6);
2578       dct_interleave16(row5, row7);
2579
2580       // transpose pass 3
2581       dct_interleave16(row0, row1);
2582       dct_interleave16(row2, row3);
2583       dct_interleave16(row4, row5);
2584       dct_interleave16(row6, row7);
2585    }
2586
2587    // row pass
2588    dct_pass(bias_1, 17);
2589
2590    {
2591       // pack
2592       __m128i p0 = _mm_packus_epi16(row0, row1); // a0a1a2a3...a7b0b1b2b3...b7
2593       __m128i p1 = _mm_packus_epi16(row2, row3);
2594       __m128i p2 = _mm_packus_epi16(row4, row5);
2595       __m128i p3 = _mm_packus_epi16(row6, row7);
2596
2597       // 8bit 8x8 transpose pass 1
2598       dct_interleave8(p0, p2); // a0e0a1e1...
2599       dct_interleave8(p1, p3); // c0g0c1g1...
2600
2601       // transpose pass 2
2602       dct_interleave8(p0, p1); // a0c0e0g0...
2603       dct_interleave8(p2, p3); // b0d0f0h0...
2604
2605       // transpose pass 3
2606       dct_interleave8(p0, p2); // a0b0c0d0...
2607       dct_interleave8(p1, p3); // a4b4c4d4...
2608
2609       // store
2610       _mm_storel_epi64((__m128i *) out, p0); out += out_stride;
2611       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p0, 0x4e)); out += out_stride;
2612       _mm_storel_epi64((__m128i *) out, p2); out += out_stride;
2613       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p2, 0x4e)); out += out_stride;
2614       _mm_storel_epi64((__m128i *) out, p1); out += out_stride;
2615       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p1, 0x4e)); out += out_stride;
2616       _mm_storel_epi64((__m128i *) out, p3); out += out_stride;
2617       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p3, 0x4e));
2618    }
2619
2620 #undef dct_const
2621 #undef dct_rot
2622 #undef dct_widen
2623 #undef dct_wadd
2624 #undef dct_wsub
2625 #undef dct_bfly32o
2626 #undef dct_interleave8
2627 #undef dct_interleave16
2628 #undef dct_pass
2629 }
2630
2631 #endif // STBI_SSE2
2632
2633 #ifdef STBI_NEON
2634
2635 // NEON integer IDCT. should produce bit-identical
2636 // results to the generic C version.
2637 static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
2638 {
2639    int16x8_t row0, row1, row2, row3, row4, row5, row6, row7;
2640
2641    int16x4_t rot0_0 = vdup_n_s16(stbi__f2f(0.5411961f));
2642    int16x4_t rot0_1 = vdup_n_s16(stbi__f2f(-1.847759065f));
2643    int16x4_t rot0_2 = vdup_n_s16(stbi__f2f( 0.765366865f));
2644    int16x4_t rot1_0 = vdup_n_s16(stbi__f2f( 1.175875602f));
2645    int16x4_t rot1_1 = vdup_n_s16(stbi__f2f(-0.899976223f));
2646    int16x4_t rot1_2 = vdup_n_s16(stbi__f2f(-2.562915447f));
2647    int16x4_t rot2_0 = vdup_n_s16(stbi__f2f(-1.961570560f));
2648    int16x4_t rot2_1 = vdup_n_s16(stbi__f2f(-0.390180644f));
2649    int16x4_t rot3_0 = vdup_n_s16(stbi__f2f( 0.298631336f));
2650    int16x4_t rot3_1 = vdup_n_s16(stbi__f2f( 2.053119869f));
2651    int16x4_t rot3_2 = vdup_n_s16(stbi__f2f( 3.072711026f));
2652    int16x4_t rot3_3 = vdup_n_s16(stbi__f2f( 1.501321110f));
2653
2654 #define dct_long_mul(out, inq, coeff) \
2655    int32x4_t out##_l = vmull_s16(vget_low_s16(inq), coeff); \
2656    int32x4_t out##_h = vmull_s16(vget_high_s16(inq), coeff)
2657
2658 #define dct_long_mac(out, acc, inq, coeff) \
2659    int32x4_t out##_l = vmlal_s16(acc##_l, vget_low_s16(inq), coeff); \
2660    int32x4_t out##_h = vmlal_s16(acc##_h, vget_high_s16(inq), coeff)
2661
2662 #define dct_widen(out, inq) \
2663    int32x4_t out##_l = vshll_n_s16(vget_low_s16(inq), 12); \
2664    int32x4_t out##_h = vshll_n_s16(vget_high_s16(inq), 12)
2665
2666 // wide add
2667 #define dct_wadd(out, a, b) \
2668    int32x4_t out##_l = vaddq_s32(a##_l, b##_l); \
2669    int32x4_t out##_h = vaddq_s32(a##_h, b##_h)
2670
2671 // wide sub
2672 #define dct_wsub(out, a, b) \
2673    int32x4_t out##_l = vsubq_s32(a##_l, b##_l); \
2674    int32x4_t out##_h = vsubq_s32(a##_h, b##_h)
2675
2676 // butterfly a/b, then shift using "shiftop" by "s" and pack
2677 #define dct_bfly32o(out0,out1, a,b,shiftop,s) \
2678    { \
2679       dct_wadd(sum, a, b); \
2680       dct_wsub(dif, a, b); \
2681       out0 = vcombine_s16(shiftop(sum_l, s), shiftop(sum_h, s)); \
2682       out1 = vcombine_s16(shiftop(dif_l, s), shiftop(dif_h, s)); \
2683    }
2684
2685 #define dct_pass(shiftop, shift) \
2686    { \
2687       /* even part */ \
2688       int16x8_t sum26 = vaddq_s16(row2, row6); \
2689       dct_long_mul(p1e, sum26, rot0_0); \
2690       dct_long_mac(t2e, p1e, row6, rot0_1); \
2691       dct_long_mac(t3e, p1e, row2, rot0_2); \
2692       int16x8_t sum04 = vaddq_s16(row0, row4); \
2693       int16x8_t dif04 = vsubq_s16(row0, row4); \
2694       dct_widen(t0e, sum04); \
2695       dct_widen(t1e, dif04); \
2696       dct_wadd(x0, t0e, t3e); \
2697       dct_wsub(x3, t0e, t3e); \
2698       dct_wadd(x1, t1e, t2e); \
2699       dct_wsub(x2, t1e, t2e); \
2700       /* odd part */ \
2701       int16x8_t sum15 = vaddq_s16(row1, row5); \
2702       int16x8_t sum17 = vaddq_s16(row1, row7); \
2703       int16x8_t sum35 = vaddq_s16(row3, row5); \
2704       int16x8_t sum37 = vaddq_s16(row3, row7); \
2705       int16x8_t sumodd = vaddq_s16(sum17, sum35); \
2706       dct_long_mul(p5o, sumodd, rot1_0); \
2707       dct_long_mac(p1o, p5o, sum17, rot1_1); \
2708       dct_long_mac(p2o, p5o, sum35, rot1_2); \
2709       dct_long_mul(p3o, sum37, rot2_0); \
2710       dct_long_mul(p4o, sum15, rot2_1); \
2711       dct_wadd(sump13o, p1o, p3o); \
2712       dct_wadd(sump24o, p2o, p4o); \
2713       dct_wadd(sump23o, p2o, p3o); \
2714       dct_wadd(sump14o, p1o, p4o); \
2715       dct_long_mac(x4, sump13o, row7, rot3_0); \
2716       dct_long_mac(x5, sump24o, row5, rot3_1); \
2717       dct_long_mac(x6, sump23o, row3, rot3_2); \
2718       dct_long_mac(x7, sump14o, row1, rot3_3); \
2719       dct_bfly32o(row0,row7, x0,x7,shiftop,shift); \
2720       dct_bfly32o(row1,row6, x1,x6,shiftop,shift); \
2721       dct_bfly32o(row2,row5, x2,x5,shiftop,shift); \
2722       dct_bfly32o(row3,row4, x3,x4,shiftop,shift); \
2723    }
2724
2725    // load
2726    row0 = vld1q_s16(data + 0*8);
2727    row1 = vld1q_s16(data + 1*8);
2728    row2 = vld1q_s16(data + 2*8);
2729    row3 = vld1q_s16(data + 3*8);
2730    row4 = vld1q_s16(data + 4*8);
2731    row5 = vld1q_s16(data + 5*8);
2732    row6 = vld1q_s16(data + 6*8);
2733    row7 = vld1q_s16(data + 7*8);
2734
2735    // add DC bias
2736    row0 = vaddq_s16(row0, vsetq_lane_s16(1024, vdupq_n_s16(0), 0));
2737
2738    // column pass
2739    dct_pass(vrshrn_n_s32, 10);
2740
2741    // 16bit 8x8 transpose
2742    {
2743 // these three map to a single VTRN.16, VTRN.32, and VSWP, respectively.
2744 // whether compilers actually get this is another story, sadly.
2745 #define dct_trn16(x, y) { int16x8x2_t t = vtrnq_s16(x, y); x = t.val[0]; y = t.val[1]; }
2746 #define dct_trn32(x, y) { int32x4x2_t t = vtrnq_s32(vreinterpretq_s32_s16(x), vreinterpretq_s32_s16(y)); x = vreinterpretq_s16_s32(t.val[0]); y = vreinterpretq_s16_s32(t.val[1]); }
2747 #define dct_trn64(x, y) { int16x8_t x0 = x; int16x8_t y0 = y; x = vcombine_s16(vget_low_s16(x0), vget_low_s16(y0)); y = vcombine_s16(vget_high_s16(x0), vget_high_s16(y0)); }
2748
2749       // pass 1
2750       dct_trn16(row0, row1); // a0b0a2b2a4b4a6b6
2751       dct_trn16(row2, row3);
2752       dct_trn16(row4, row5);
2753       dct_trn16(row6, row7);
2754
2755       // pass 2
2756       dct_trn32(row0, row2); // a0b0c0d0a4b4c4d4
2757       dct_trn32(row1, row3);
2758       dct_trn32(row4, row6);
2759       dct_trn32(row5, row7);
2760
2761       // pass 3
2762       dct_trn64(row0, row4); // a0b0c0d0e0f0g0h0
2763       dct_trn64(row1, row5);
2764       dct_trn64(row2, row6);
2765       dct_trn64(row3, row7);
2766
2767 #undef dct_trn16
2768 #undef dct_trn32
2769 #undef dct_trn64
2770    }
2771
2772    // row pass
2773    // vrshrn_n_s32 only supports shifts up to 16, we need
2774    // 17. so do a non-rounding shift of 16 first then follow
2775    // up with a rounding shift by 1.
2776    dct_pass(vshrn_n_s32, 16);
2777
2778    {
2779       // pack and round
2780       uint8x8_t p0 = vqrshrun_n_s16(row0, 1);
2781       uint8x8_t p1 = vqrshrun_n_s16(row1, 1);
2782       uint8x8_t p2 = vqrshrun_n_s16(row2, 1);
2783       uint8x8_t p3 = vqrshrun_n_s16(row3, 1);
2784       uint8x8_t p4 = vqrshrun_n_s16(row4, 1);
2785       uint8x8_t p5 = vqrshrun_n_s16(row5, 1);
2786       uint8x8_t p6 = vqrshrun_n_s16(row6, 1);
2787       uint8x8_t p7 = vqrshrun_n_s16(row7, 1);
2788
2789       // again, these can translate into one instruction, but often don't.
2790 #define dct_trn8_8(x, y) { uint8x8x2_t t = vtrn_u8(x, y); x = t.val[0]; y = t.val[1]; }
2791 #define dct_trn8_16(x, y) { uint16x4x2_t t = vtrn_u16(vreinterpret_u16_u8(x), vreinterpret_u16_u8(y)); x = vreinterpret_u8_u16(t.val[0]); y = vreinterpret_u8_u16(t.val[1]); }
2792 #define dct_trn8_32(x, y) { uint32x2x2_t t = vtrn_u32(vreinterpret_u32_u8(x), vreinterpret_u32_u8(y)); x = vreinterpret_u8_u32(t.val[0]); y = vreinterpret_u8_u32(t.val[1]); }
2793
2794       // sadly can't use interleaved stores here since we only write
2795       // 8 bytes to each scan line!
2796
2797       // 8x8 8-bit transpose pass 1
2798       dct_trn8_8(p0, p1);
2799       dct_trn8_8(p2, p3);
2800       dct_trn8_8(p4, p5);
2801       dct_trn8_8(p6, p7);
2802
2803       // pass 2
2804       dct_trn8_16(p0, p2);
2805       dct_trn8_16(p1, p3);
2806       dct_trn8_16(p4, p6);
2807       dct_trn8_16(p5, p7);
2808
2809       // pass 3
2810       dct_trn8_32(p0, p4);
2811       dct_trn8_32(p1, p5);
2812       dct_trn8_32(p2, p6);
2813       dct_trn8_32(p3, p7);
2814
2815       // store
2816       vst1_u8(out, p0); out += out_stride;
2817       vst1_u8(out, p1); out += out_stride;
2818       vst1_u8(out, p2); out += out_stride;
2819       vst1_u8(out, p3); out += out_stride;
2820       vst1_u8(out, p4); out += out_stride;
2821       vst1_u8(out, p5); out += out_stride;
2822       vst1_u8(out, p6); out += out_stride;
2823       vst1_u8(out, p7);
2824
2825 #undef dct_trn8_8
2826 #undef dct_trn8_16
2827 #undef dct_trn8_32
2828    }
2829
2830 #undef dct_long_mul
2831 #undef dct_long_mac
2832 #undef dct_widen
2833 #undef dct_wadd
2834 #undef dct_wsub
2835 #undef dct_bfly32o
2836 #undef dct_pass
2837 }
2838
2839 #endif // STBI_NEON
2840
2841 #define STBI__MARKER_none  0xff
2842 // if there's a pending marker from the entropy stream, return that
2843 // otherwise, fetch from the stream and get a marker. if there's no
2844 // marker, return 0xff, which is never a valid marker value
2845 static stbi_uc stbi__get_marker(stbi__jpeg *j)
2846 {
2847    stbi_uc x;
2848    if (j->marker != STBI__MARKER_none) { x = j->marker; j->marker = STBI__MARKER_none; return x; }
2849    x = stbi__get8(j->s);
2850    if (x != 0xff) return STBI__MARKER_none;
2851    while (x == 0xff)
2852       x = stbi__get8(j->s); // consume repeated 0xff fill bytes
2853    return x;
2854 }
2855
2856 // in each scan, we'll have scan_n components, and the order
2857 // of the components is specified by order[]
2858 #define STBI__RESTART(x)     ((x) >= 0xd0 && (x) <= 0xd7)
2859
2860 // after a restart interval, stbi__jpeg_reset the entropy decoder and
2861 // the dc prediction
2862 static void stbi__jpeg_reset(stbi__jpeg *j)
2863 {
2864    j->code_bits = 0;
2865    j->code_buffer = 0;
2866    j->nomore = 0;
2867    j->img_comp[0].dc_pred = j->img_comp[1].dc_pred = j->img_comp[2].dc_pred = j->img_comp[3].dc_pred = 0;
2868    j->marker = STBI__MARKER_none;
2869    j->todo = j->restart_interval ? j->restart_interval : 0x7fffffff;
2870    j->eob_run = 0;
2871    // no more than 1<<31 MCUs if no restart_interal? that's plenty safe,
2872    // since we don't even allow 1<<30 pixels
2873 }
2874
2875 static int stbi__parse_entropy_coded_data(stbi__jpeg *z)
2876 {
2877    stbi__jpeg_reset(z);
2878    if (!z->progressive) {
2879       if (z->scan_n == 1) {
2880          int i,j;
2881          STBI_SIMD_ALIGN(short, data[64]);
2882          int n = z->order[0];
2883          // non-interleaved data, we just need to process one block at a time,
2884          // in trivial scanline order
2885          // number of blocks to do just depends on how many actual "pixels" this
2886          // component has, independent of interleaved MCU blocking and such
2887          int w = (z->img_comp[n].x+7) >> 3;
2888          int h = (z->img_comp[n].y+7) >> 3;
2889          for (j=0; j < h; ++j) {
2890             for (i=0; i < w; ++i) {
2891                int ha = z->img_comp[n].ha;
2892                if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0;
2893                z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data);
2894                // every data block is an MCU, so countdown the restart interval
2895                if (--z->todo <= 0) {
2896                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2897                   // if it's NOT a restart, then just bail, so we get corrupt data
2898                   // rather than no data
2899                   if (!STBI__RESTART(z->marker)) return 1;
2900                   stbi__jpeg_reset(z);
2901                }
2902             }
2903          }
2904          return 1;
2905       } else { // interleaved
2906          int i,j,k,x,y;
2907          STBI_SIMD_ALIGN(short, data[64]);
2908          for (j=0; j < z->img_mcu_y; ++j) {
2909             for (i=0; i < z->img_mcu_x; ++i) {
2910                // scan an interleaved mcu... process scan_n components in order
2911                for (k=0; k < z->scan_n; ++k) {
2912                   int n = z->order[k];
2913                   // scan out an mcu's worth of this component; that's just determined
2914                   // by the basic H and V specified for the component
2915                   for (y=0; y < z->img_comp[n].v; ++y) {
2916                      for (x=0; x < z->img_comp[n].h; ++x) {
2917                         int x2 = (i*z->img_comp[n].h + x)*8;
2918                         int y2 = (j*z->img_comp[n].v + y)*8;
2919                         int ha = z->img_comp[n].ha;
2920                         if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0;
2921                         z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*y2+x2, z->img_comp[n].w2, data);
2922                      }
2923                   }
2924                }
2925                // after all interleaved components, that's an interleaved MCU,
2926                // so now count down the restart interval
2927                if (--z->todo <= 0) {
2928                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2929                   if (!STBI__RESTART(z->marker)) return 1;
2930                   stbi__jpeg_reset(z);
2931                }
2932             }
2933          }
2934          return 1;
2935       }
2936    } else {
2937       if (z->scan_n == 1) {
2938          int i,j;
2939          int n = z->order[0];
2940          // non-interleaved data, we just need to process one block at a time,
2941          // in trivial scanline order
2942          // number of blocks to do just depends on how many actual "pixels" this
2943          // component has, independent of interleaved MCU blocking and such
2944          int w = (z->img_comp[n].x+7) >> 3;
2945          int h = (z->img_comp[n].y+7) >> 3;
2946          for (j=0; j < h; ++j) {
2947             for (i=0; i < w; ++i) {
2948                short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
2949                if (z->spec_start == 0) {
2950                   if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
2951                      return 0;
2952                } else {
2953                   int ha = z->img_comp[n].ha;
2954                   if (!stbi__jpeg_decode_block_prog_ac(z, data, &z->huff_ac[ha], z->fast_ac[ha]))
2955                      return 0;
2956                }
2957                // every data block is an MCU, so countdown the restart interval
2958                if (--z->todo <= 0) {
2959                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2960                   if (!STBI__RESTART(z->marker)) return 1;
2961                   stbi__jpeg_reset(z);
2962                }
2963             }
2964          }
2965          return 1;
2966       } else { // interleaved
2967          int i,j,k,x,y;
2968          for (j=0; j < z->img_mcu_y; ++j) {
2969             for (i=0; i < z->img_mcu_x; ++i) {
2970                // scan an interleaved mcu... process scan_n components in order
2971                for (k=0; k < z->scan_n; ++k) {
2972                   int n = z->order[k];
2973                   // scan out an mcu's worth of this component; that's just determined
2974                   // by the basic H and V specified for the component
2975                   for (y=0; y < z->img_comp[n].v; ++y) {
2976                      for (x=0; x < z->img_comp[n].h; ++x) {
2977                         int x2 = (i*z->img_comp[n].h + x);
2978                         int y2 = (j*z->img_comp[n].v + y);
2979                         short *data = z->img_comp[n].coeff + 64 * (x2 + y2 * z->img_comp[n].coeff_w);
2980                         if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
2981                            return 0;
2982                      }
2983                   }
2984                }
2985                // after all interleaved components, that's an interleaved MCU,
2986                // so now count down the restart interval
2987                if (--z->todo <= 0) {
2988                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2989                   if (!STBI__RESTART(z->marker)) return 1;
2990                   stbi__jpeg_reset(z);
2991                }
2992             }
2993          }
2994          return 1;
2995       }
2996    }
2997 }
2998
2999 static void stbi__jpeg_dequantize(short *data, stbi__uint16 *dequant)
3000 {
3001    int i;
3002    for (i=0; i < 64; ++i)
3003       data[i] *= dequant[i];
3004 }
3005
3006 static void stbi__jpeg_finish(stbi__jpeg *z)
3007 {
3008    if (z->progressive) {
3009       // dequantize and idct the data
3010       int i,j,n;
3011       for (n=0; n < z->s->img_n; ++n) {
3012          int w = (z->img_comp[n].x+7) >> 3;
3013          int h = (z->img_comp[n].y+7) >> 3;
3014          for (j=0; j < h; ++j) {
3015             for (i=0; i < w; ++i) {
3016                short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
3017                stbi__jpeg_dequantize(data, z->dequant[z->img_comp[n].tq]);
3018                z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data);
3019             }
3020          }
3021       }
3022    }
3023 }
3024
3025 static int stbi__process_marker(stbi__jpeg *z, int m)
3026 {
3027    int L;
3028    switch (m) {
3029       case STBI__MARKER_none: // no marker found
3030          return stbi__err("expected marker","Corrupt JPEG");
3031
3032       case 0xDD: // DRI - specify restart interval
3033          if (stbi__get16be(z->s) != 4) return stbi__err("bad DRI len","Corrupt JPEG");
3034          z->restart_interval = stbi__get16be(z->s);
3035          return 1;
3036
3037       case 0xDB: // DQT - define quantization table
3038          L = stbi__get16be(z->s)-2;
3039          while (L > 0) {
3040             int q = stbi__get8(z->s);
3041             int p = q >> 4, sixteen = (p != 0);
3042             int t = q & 15,i;
3043             if (p != 0 && p != 1) return stbi__err("bad DQT type","Corrupt JPEG");
3044             if (t > 3) return stbi__err("bad DQT table","Corrupt JPEG");
3045
3046             for (i=0; i < 64; ++i)
3047                z->dequant[t][stbi__jpeg_dezigzag[i]] = (stbi__uint16)(sixteen ? stbi__get16be(z->s) : stbi__get8(z->s));
3048             L -= (sixteen ? 129 : 65);
3049          }
3050          return L==0;
3051
3052       case 0xC4: // DHT - define huffman table
3053          L = stbi__get16be(z->s)-2;
3054          while (L > 0) {
3055             stbi_uc *v;
3056             int sizes[16],i,n=0;
3057             int q = stbi__get8(z->s);
3058             int tc = q >> 4;
3059             int th = q & 15;
3060             if (tc > 1 || th > 3) return stbi__err("bad DHT header","Corrupt JPEG");
3061             for (i=0; i < 16; ++i) {
3062                sizes[i] = stbi__get8(z->s);
3063                n += sizes[i];
3064             }
3065             L -= 17;
3066             if (tc == 0) {
3067                if (!stbi__build_huffman(z->huff_dc+th, sizes)) return 0;
3068                v = z->huff_dc[th].values;
3069             } else {
3070                if (!stbi__build_huffman(z->huff_ac+th, sizes)) return 0;
3071                v = z->huff_ac[th].values;
3072             }
3073             for (i=0; i < n; ++i)
3074                v[i] = stbi__get8(z->s);
3075             if (tc != 0)
3076                stbi__build_fast_ac(z->fast_ac[th], z->huff_ac + th);
3077             L -= n;
3078          }
3079          return L==0;
3080    }
3081
3082    // check for comment block or APP blocks
3083    if ((m >= 0xE0 && m <= 0xEF) || m == 0xFE) {
3084       L = stbi__get16be(z->s);
3085       if (L < 2) {
3086          if (m == 0xFE)
3087             return stbi__err("bad COM len","Corrupt JPEG");
3088          else
3089             return stbi__err("bad APP len","Corrupt JPEG");
3090       }
3091       L -= 2;
3092
3093       if (m == 0xE0 && L >= 5) { // JFIF APP0 segment
3094          static const unsigned char tag[5] = {'J','F','I','F','\0'};
3095          int ok = 1;
3096          int i;
3097          for (i=0; i < 5; ++i)
3098             if (stbi__get8(z->s) != tag[i])
3099                ok = 0;
3100          L -= 5;
3101          if (ok)
3102             z->jfif = 1;
3103       } else if (m == 0xEE && L >= 12) { // Adobe APP14 segment
3104          static const unsigned char tag[6] = {'A','d','o','b','e','\0'};
3105          int ok = 1;
3106          int i;
3107          for (i=0; i < 6; ++i)
3108             if (stbi__get8(z->s) != tag[i])
3109                ok = 0;
3110          L -= 6;
3111          if (ok) {
3112             stbi__get8(z->s); // version
3113             stbi__get16be(z->s); // flags0
3114             stbi__get16be(z->s); // flags1
3115             z->app14_color_transform = stbi__get8(z->s); // color transform
3116             L -= 6;
3117          }
3118       }
3119
3120       stbi__skip(z->s, L);
3121       return 1;
3122    }
3123
3124    return stbi__err("unknown marker","Corrupt JPEG");
3125 }
3126
3127 // after we see SOS
3128 static int stbi__process_scan_header(stbi__jpeg *z)
3129 {
3130    int i;
3131    int Ls = stbi__get16be(z->s);
3132    z->scan_n = stbi__get8(z->s);
3133    if (z->scan_n < 1 || z->scan_n > 4 || z->scan_n > (int) z->s->img_n) return stbi__err("bad SOS component count","Corrupt JPEG");
3134    if (Ls != 6+2*z->scan_n) return stbi__err("bad SOS len","Corrupt JPEG");
3135    for (i=0; i < z->scan_n; ++i) {
3136       int id = stbi__get8(z->s), which;
3137       int q = stbi__get8(z->s);
3138       for (which = 0; which < z->s->img_n; ++which)
3139          if (z->img_comp[which].id == id)
3140             break;
3141       if (which == z->s->img_n) return 0; // no match
3142       z->img_comp[which].hd = q >> 4;   if (z->img_comp[which].hd > 3) return stbi__err("bad DC huff","Corrupt JPEG");
3143       z->img_comp[which].ha = q & 15;   if (z->img_comp[which].ha > 3) return stbi__err("bad AC huff","Corrupt JPEG");
3144       z->order[i] = which;
3145    }
3146
3147    {
3148       int aa;
3149       z->spec_start = stbi__get8(z->s);
3150       z->spec_end   = stbi__get8(z->s); // should be 63, but might be 0
3151       aa = stbi__get8(z->s);
3152       z->succ_high = (aa >> 4);
3153       z->succ_low  = (aa & 15);
3154       if (z->progressive) {
3155          if (z->spec_start > 63 || z->spec_end > 63  || z->spec_start > z->spec_end || z->succ_high > 13 || z->succ_low > 13)
3156             return stbi__err("bad SOS", "Corrupt JPEG");
3157       } else {
3158          if (z->spec_start != 0) return stbi__err("bad SOS","Corrupt JPEG");
3159          if (z->succ_high != 0 || z->succ_low != 0) return stbi__err("bad SOS","Corrupt JPEG");
3160          z->spec_end = 63;
3161       }
3162    }
3163
3164    return 1;
3165 }
3166
3167 static int stbi__free_jpeg_components(stbi__jpeg *z, int ncomp, int why)
3168 {
3169    int i;
3170    for (i=0; i < ncomp; ++i) {
3171       if (z->img_comp[i].raw_data) {
3172          STBI_FREE(z->img_comp[i].raw_data);
3173          z->img_comp[i].raw_data = NULL;
3174          z->img_comp[i].data = NULL;
3175       }
3176       if (z->img_comp[i].raw_coeff) {
3177          STBI_FREE(z->img_comp[i].raw_coeff);
3178          z->img_comp[i].raw_coeff = 0;
3179          z->img_comp[i].coeff = 0;
3180       }
3181       if (z->img_comp[i].linebuf) {
3182          STBI_FREE(z->img_comp[i].linebuf);
3183          z->img_comp[i].linebuf = NULL;
3184       }
3185    }
3186    return why;
3187 }
3188
3189 static int stbi__process_frame_header(stbi__jpeg *z, int scan)
3190 {
3191    stbi__context *s = z->s;
3192    int Lf,p,i,q, h_max=1,v_max=1,c;
3193    Lf = stbi__get16be(s);         if (Lf < 11) return stbi__err("bad SOF len","Corrupt JPEG"); // JPEG
3194    p  = stbi__get8(s);            if (p != 8) return stbi__err("only 8-bit","JPEG format not supported: 8-bit only"); // JPEG baseline
3195    s->img_y = stbi__get16be(s);   if (s->img_y == 0) return stbi__err("no header height", "JPEG format not supported: delayed height"); // Legal, but we don't handle it--but neither does IJG
3196    s->img_x = stbi__get16be(s);   if (s->img_x == 0) return stbi__err("0 width","Corrupt JPEG"); // JPEG requires
3197    if (s->img_y > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
3198    if (s->img_x > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
3199    c = stbi__get8(s);
3200    if (c != 3 && c != 1 && c != 4) return stbi__err("bad component count","Corrupt JPEG");
3201    s->img_n = c;
3202    for (i=0; i < c; ++i) {
3203       z->img_comp[i].data = NULL;
3204       z->img_comp[i].linebuf = NULL;
3205    }
3206
3207    if (Lf != 8+3*s->img_n) return stbi__err("bad SOF len","Corrupt JPEG");
3208
3209    z->rgb = 0;
3210    for (i=0; i < s->img_n; ++i) {
3211       static const unsigned char rgb[3] = { 'R', 'G', 'B' };
3212       z->img_comp[i].id = stbi__get8(s);
3213       if (s->img_n == 3 && z->img_comp[i].id == rgb[i])
3214          ++z->rgb;
3215       q = stbi__get8(s);
3216       z->img_comp[i].h = (q >> 4);  if (!z->img_comp[i].h || z->img_comp[i].h > 4) return stbi__err("bad H","Corrupt JPEG");
3217       z->img_comp[i].v = q & 15;    if (!z->img_comp[i].v || z->img_comp[i].v > 4) return stbi__err("bad V","Corrupt JPEG");
3218       z->img_comp[i].tq = stbi__get8(s);  if (z->img_comp[i].tq > 3) return stbi__err("bad TQ","Corrupt JPEG");
3219    }
3220
3221    if (scan != STBI__SCAN_load) return 1;
3222
3223    if (!stbi__mad3sizes_valid(s->img_x, s->img_y, s->img_n, 0)) return stbi__err("too large", "Image too large to decode");
3224
3225    for (i=0; i < s->img_n; ++i) {
3226       if (z->img_comp[i].h > h_max) h_max = z->img_comp[i].h;
3227       if (z->img_comp[i].v > v_max) v_max = z->img_comp[i].v;
3228    }
3229
3230    // compute interleaved mcu info
3231    z->img_h_max = h_max;
3232    z->img_v_max = v_max;
3233    z->img_mcu_w = h_max * 8;
3234    z->img_mcu_h = v_max * 8;
3235    // these sizes can't be more than 17 bits
3236    z->img_mcu_x = (s->img_x + z->img_mcu_w-1) / z->img_mcu_w;
3237    z->img_mcu_y = (s->img_y + z->img_mcu_h-1) / z->img_mcu_h;
3238
3239    for (i=0; i < s->img_n; ++i) {
3240       // number of effective pixels (e.g. for non-interleaved MCU)
3241       z->img_comp[i].x = (s->img_x * z->img_comp[i].h + h_max-1) / h_max;
3242       z->img_comp[i].y = (s->img_y * z->img_comp[i].v + v_max-1) / v_max;
3243       // to simplify generation, we'll allocate enough memory to decode
3244       // the bogus oversized data from using interleaved MCUs and their
3245       // big blocks (e.g. a 16x16 iMCU on an image of width 33); we won't
3246       // discard the extra data until colorspace conversion
3247       //
3248       // img_mcu_x, img_mcu_y: <=17 bits; comp[i].h and .v are <=4 (checked earlier)
3249       // so these muls can't overflow with 32-bit ints (which we require)
3250       z->img_comp[i].w2 = z->img_mcu_x * z->img_comp[i].h * 8;
3251       z->img_comp[i].h2 = z->img_mcu_y * z->img_comp[i].v * 8;
3252       z->img_comp[i].coeff = 0;
3253       z->img_comp[i].raw_coeff = 0;
3254       z->img_comp[i].linebuf = NULL;
3255       z->img_comp[i].raw_data = stbi__malloc_mad2(z->img_comp[i].w2, z->img_comp[i].h2, 15);
3256       if (z->img_comp[i].raw_data == NULL)
3257          return stbi__free_jpeg_components(z, i+1, stbi__err("outofmem", "Out of memory"));
3258       // align blocks for idct using mmx/sse
3259       z->img_comp[i].data = (stbi_uc*) (((size_t) z->img_comp[i].raw_data + 15) & ~15);
3260       if (z->progressive) {
3261          // w2, h2 are multiples of 8 (see above)
3262          z->img_comp[i].coeff_w = z->img_comp[i].w2 / 8;
3263          z->img_comp[i].coeff_h = z->img_comp[i].h2 / 8;
3264          z->img_comp[i].raw_coeff = stbi__malloc_mad3(z->img_comp[i].w2, z->img_comp[i].h2, sizeof(short), 15);
3265          if (z->img_comp[i].raw_coeff == NULL)
3266             return stbi__free_jpeg_components(z, i+1, stbi__err("outofmem", "Out of memory"));
3267          z->img_comp[i].coeff = (short*) (((size_t) z->img_comp[i].raw_coeff + 15) & ~15);
3268       }
3269    }
3270
3271    return 1;
3272 }
3273
3274 // use comparisons since in some cases we handle more than one case (e.g. SOF)
3275 #define stbi__DNL(x)         ((x) == 0xdc)
3276 #define stbi__SOI(x)         ((x) == 0xd8)
3277 #define stbi__EOI(x)         ((x) == 0xd9)
3278 #define stbi__SOF(x)         ((x) == 0xc0 || (x) == 0xc1 || (x) == 0xc2)
3279 #define stbi__SOS(x)         ((x) == 0xda)
3280
3281 #define stbi__SOF_progressive(x)   ((x) == 0xc2)
3282
3283 static int stbi__decode_jpeg_header(stbi__jpeg *z, int scan)
3284 {
3285    int m;
3286    z->jfif = 0;
3287    z->app14_color_transform = -1; // valid values are 0,1,2
3288    z->marker = STBI__MARKER_none; // initialize cached marker to empty
3289    m = stbi__get_marker(z);
3290    if (!stbi__SOI(m)) return stbi__err("no SOI","Corrupt JPEG");
3291    if (scan == STBI__SCAN_type) return 1;
3292    m = stbi__get_marker(z);
3293    while (!stbi__SOF(m)) {
3294       if (!stbi__process_marker(z,m)) return 0;
3295       m = stbi__get_marker(z);
3296       while (m == STBI__MARKER_none) {
3297          // some files have extra padding after their blocks, so ok, we'll scan
3298          if (stbi__at_eof(z->s)) return stbi__err("no SOF", "Corrupt JPEG");
3299          m = stbi__get_marker(z);
3300       }
3301    }
3302    z->progressive = stbi__SOF_progressive(m);
3303    if (!stbi__process_frame_header(z, scan)) return 0;
3304    return 1;
3305 }
3306
3307 // decode image to YCbCr format
3308 static int stbi__decode_jpeg_image(stbi__jpeg *j)
3309 {
3310    int m;
3311    for (m = 0; m < 4; m++) {
3312       j->img_comp[m].raw_data = NULL;
3313       j->img_comp[m].raw_coeff = NULL;
3314    }
3315    j->restart_interval = 0;
3316    if (!stbi__decode_jpeg_header(j, STBI__SCAN_load)) return 0;
3317    m = stbi__get_marker(j);
3318    while (!stbi__EOI(m)) {
3319       if (stbi__SOS(m)) {
3320          if (!stbi__process_scan_header(j)) return 0;
3321          if (!stbi__parse_entropy_coded_data(j)) return 0;
3322          if (j->marker == STBI__MARKER_none ) {
3323             // handle 0s at the end of image data from IP Kamera 9060
3324             while (!stbi__at_eof(j->s)) {
3325                int x = stbi__get8(j->s);
3326                if (x == 255) {
3327                   j->marker = stbi__get8(j->s);
3328                   break;
3329                }
3330             }
3331             // if we reach eof without hitting a marker, stbi__get_marker() below will fail and we'll eventually return 0
3332          }
3333       } else if (stbi__DNL(m)) {
3334          int Ld = stbi__get16be(j->s);
3335          stbi__uint32 NL = stbi__get16be(j->s);
3336          if (Ld != 4) return stbi__err("bad DNL len", "Corrupt JPEG");
3337          if (NL != j->s->img_y) return stbi__err("bad DNL height", "Corrupt JPEG");
3338       } else {
3339          if (!stbi__process_marker(j, m)) return 0;
3340       }
3341       m = stbi__get_marker(j);
3342    }
3343    if (j->progressive)
3344       stbi__jpeg_finish(j);
3345    return 1;
3346 }
3347
3348 // static jfif-centered resampling (across block boundaries)
3349
3350 typedef stbi_uc *(*resample_row_func)(stbi_uc *out, stbi_uc *in0, stbi_uc *in1,
3351                                     int w, int hs);
3352
3353 #define stbi__div4(x) ((stbi_uc) ((x) >> 2))
3354
3355 static stbi_uc *resample_row_1(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
3356 {
3357    STBI_NOTUSED(out);
3358    STBI_NOTUSED(in_far);
3359    STBI_NOTUSED(w);
3360    STBI_NOTUSED(hs);
3361    return in_near;
3362 }
3363
3364 static stbi_uc* stbi__resample_row_v_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
3365 {
3366    // need to generate two samples vertically for every one in input
3367    int i;
3368    STBI_NOTUSED(hs);
3369    for (i=0; i < w; ++i)
3370       out[i] = stbi__div4(3*in_near[i] + in_far[i] + 2);
3371    return out;
3372 }
3373
3374 static stbi_uc*  stbi__resample_row_h_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
3375 {
3376    // need to generate two samples horizontally for every one in input
3377    int i;
3378    stbi_uc *input = in_near;
3379
3380    if (w == 1) {
3381       // if only one sample, can't do any interpolation
3382       out[0] = out[1] = input[0];
3383       return out;
3384    }
3385
3386    out[0] = input[0];
3387    out[1] = stbi__div4(input[0]*3 + input[1] + 2);
3388    for (i=1; i < w-1; ++i) {
3389       int n = 3*input[i]+2;
3390       out[i*2+0] = stbi__div4(n+input[i-1]);
3391       out[i*2+1] = stbi__div4(n+input[i+1]);
3392    }
3393    out[i*2+0] = stbi__div4(input[w-2]*3 + input[w-1] + 2);
3394    out[i*2+1] = input[w-1];
3395
3396    STBI_NOTUSED(in_far);
3397    STBI_NOTUSED(hs);
3398
3399    return out;
3400 }
3401
3402 #define stbi__div16(x) ((stbi_uc) ((x) >> 4))
3403
3404 static stbi_uc *stbi__resample_row_hv_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
3405 {
3406    // need to generate 2x2 samples for every one in input
3407    int i,t0,t1;
3408    if (w == 1) {
3409       out[0] = out[1] = stbi__div4(3*in_near[0] + in_far[0] + 2);
3410       return out;
3411    }
3412
3413    t1 = 3*in_near[0] + in_far[0];
3414    out[0] = stbi__div4(t1+2);
3415    for (i=1; i < w; ++i) {
3416       t0 = t1;
3417       t1 = 3*in_near[i]+in_far[i];
3418       out[i*2-1] = stbi__div16(3*t0 + t1 + 8);
3419       out[i*2  ] = stbi__div16(3*t1 + t0 + 8);
3420    }
3421    out[w*2-1] = stbi__div4(t1+2);
3422
3423    STBI_NOTUSED(hs);
3424
3425    return out;
3426 }
3427
3428 #if defined(STBI_SSE2) || defined(STBI_NEON)
3429 static stbi_uc *stbi__resample_row_hv_2_simd(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
3430 {
3431    // need to generate 2x2 samples for every one in input
3432    int i=0,t0,t1;
3433
3434    if (w == 1) {
3435       out[0] = out[1] = stbi__div4(3*in_near[0] + in_far[0] + 2);
3436       return out;
3437    }
3438
3439    t1 = 3*in_near[0] + in_far[0];
3440    // process groups of 8 pixels for as long as we can.
3441    // note we can't handle the last pixel in a row in this loop
3442    // because we need to handle the filter boundary conditions.
3443    for (; i < ((w-1) & ~7); i += 8) {
3444 #if defined(STBI_SSE2)
3445       // load and perform the vertical filtering pass
3446       // this uses 3*x + y = 4*x + (y - x)
3447       __m128i zero  = _mm_setzero_si128();
3448       __m128i farb  = _mm_loadl_epi64((__m128i *) (in_far + i));
3449       __m128i nearb = _mm_loadl_epi64((__m128i *) (in_near + i));
3450       __m128i farw  = _mm_unpacklo_epi8(farb, zero);
3451       __m128i nearw = _mm_unpacklo_epi8(nearb, zero);
3452       __m128i diff  = _mm_sub_epi16(farw, nearw);
3453       __m128i nears = _mm_slli_epi16(nearw, 2);
3454       __m128i curr  = _mm_add_epi16(nears, diff); // current row
3455
3456       // horizontal filter works the same based on shifted vers of current
3457       // row. "prev" is current row shifted right by 1 pixel; we need to
3458       // insert the previous pixel value (from t1).
3459       // "next" is current row shifted left by 1 pixel, with first pixel
3460       // of next block of 8 pixels added in.
3461       __m128i prv0 = _mm_slli_si128(curr, 2);
3462       __m128i nxt0 = _mm_srli_si128(curr, 2);
3463       __m128i prev = _mm_insert_epi16(prv0, t1, 0);
3464       __m128i next = _mm_insert_epi16(nxt0, 3*in_near[i+8] + in_far[i+8], 7);
3465
3466       // horizontal filter, polyphase implementation since it's convenient:
3467       // even pixels = 3*cur + prev = cur*4 + (prev - cur)
3468       // odd  pixels = 3*cur + next = cur*4 + (next - cur)
3469       // note the shared term.
3470       __m128i bias  = _mm_set1_epi16(8);
3471       __m128i curs = _mm_slli_epi16(curr, 2);
3472       __m128i prvd = _mm_sub_epi16(prev, curr);
3473       __m128i nxtd = _mm_sub_epi16(next, curr);
3474       __m128i curb = _mm_add_epi16(curs, bias);
3475       __m128i even = _mm_add_epi16(prvd, curb);
3476       __m128i odd  = _mm_add_epi16(nxtd, curb);
3477
3478       // interleave even and odd pixels, then undo scaling.
3479       __m128i int0 = _mm_unpacklo_epi16(even, odd);
3480       __m128i int1 = _mm_unpackhi_epi16(even, odd);
3481       __m128i de0  = _mm_srli_epi16(int0, 4);
3482       __m128i de1  = _mm_srli_epi16(int1, 4);
3483
3484       // pack and write output
3485       __m128i outv = _mm_packus_epi16(de0, de1);
3486       _mm_storeu_si128((__m128i *) (out + i*2), outv);
3487 #elif defined(STBI_NEON)
3488       // load and perform the vertical filtering pass
3489       // this uses 3*x + y = 4*x + (y - x)
3490       uint8x8_t farb  = vld1_u8(in_far + i);
3491       uint8x8_t nearb = vld1_u8(in_near + i);
3492       int16x8_t diff  = vreinterpretq_s16_u16(vsubl_u8(farb, nearb));
3493       int16x8_t nears = vreinterpretq_s16_u16(vshll_n_u8(nearb, 2));
3494       int16x8_t curr  = vaddq_s16(nears, diff); // current row
3495
3496       // horizontal filter works the same based on shifted vers of current
3497       // row. "prev" is current row shifted right by 1 pixel; we need to
3498       // insert the previous pixel value (from t1).
3499       // "next" is current row shifted left by 1 pixel, with first pixel
3500       // of next block of 8 pixels added in.
3501       int16x8_t prv0 = vextq_s16(curr, curr, 7);
3502       int16x8_t nxt0 = vextq_s16(curr, curr, 1);
3503       int16x8_t prev = vsetq_lane_s16(t1, prv0, 0);
3504       int16x8_t next = vsetq_lane_s16(3*in_near[i+8] + in_far[i+8], nxt0, 7);
3505
3506       // horizontal filter, polyphase implementation since it's convenient:
3507       // even pixels = 3*cur + prev = cur*4 + (prev - cur)
3508       // odd  pixels = 3*cur + next = cur*4 + (next - cur)
3509       // note the shared term.
3510       int16x8_t curs = vshlq_n_s16(curr, 2);
3511       int16x8_t prvd = vsubq_s16(prev, curr);
3512       int16x8_t nxtd = vsubq_s16(next, curr);
3513       int16x8_t even = vaddq_s16(curs, prvd);
3514       int16x8_t odd  = vaddq_s16(curs, nxtd);
3515
3516       // undo scaling and round, then store with even/odd phases interleaved
3517       uint8x8x2_t o;
3518       o.val[0] = vqrshrun_n_s16(even, 4);
3519       o.val[1] = vqrshrun_n_s16(odd,  4);
3520       vst2_u8(out + i*2, o);
3521 #endif
3522
3523       // "previous" value for next iter
3524       t1 = 3*in_near[i+7] + in_far[i+7];
3525    }
3526
3527    t0 = t1;
3528    t1 = 3*in_near[i] + in_far[i];
3529    out[i*2] = stbi__div16(3*t1 + t0 + 8);
3530
3531    for (++i; i < w; ++i) {
3532       t0 = t1;
3533       t1 = 3*in_near[i]+in_far[i];
3534       out[i*2-1] = stbi__div16(3*t0 + t1 + 8);
3535       out[i*2  ] = stbi__div16(3*t1 + t0 + 8);
3536    }
3537    out[w*2-1] = stbi__div4(t1+2);
3538
3539    STBI_NOTUSED(hs);
3540
3541    return out;
3542 }
3543 #endif
3544
3545 static stbi_uc *stbi__resample_row_generic(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
3546 {
3547    // resample with nearest-neighbor
3548    int i,j;
3549    STBI_NOTUSED(in_far);
3550    for (i=0; i < w; ++i)
3551       for (j=0; j < hs; ++j)
3552          out[i*hs+j] = in_near[i];
3553    return out;
3554 }
3555
3556 // this is a reduced-precision calculation of YCbCr-to-RGB introduced
3557 // to make sure the code produces the same results in both SIMD and scalar
3558 #define stbi__float2fixed(x)  (((int) ((x) * 4096.0f + 0.5f)) << 8)
3559 static void stbi__YCbCr_to_RGB_row(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step)
3560 {
3561    int i;
3562    for (i=0; i < count; ++i) {
3563       int y_fixed = (y[i] << 20) + (1<<19); // rounding
3564       int r,g,b;
3565       int cr = pcr[i] - 128;
3566       int cb = pcb[i] - 128;
3567       r = y_fixed +  cr* stbi__float2fixed(1.40200f);
3568       g = y_fixed + (cr*-stbi__float2fixed(0.71414f)) + ((cb*-stbi__float2fixed(0.34414f)) & 0xffff0000);
3569       b = y_fixed                                     +   cb* stbi__float2fixed(1.77200f);
3570       r >>= 20;
3571       g >>= 20;
3572       b >>= 20;
3573       if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
3574       if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
3575       if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
3576       out[0] = (stbi_uc)r;
3577       out[1] = (stbi_uc)g;
3578       out[2] = (stbi_uc)b;
3579       out[3] = 255;
3580       out += step;
3581    }
3582 }
3583
3584 #if defined(STBI_SSE2) || defined(STBI_NEON)
3585 static void stbi__YCbCr_to_RGB_simd(stbi_uc *out, stbi_uc const *y, stbi_uc const *pcb, stbi_uc const *pcr, int count, int step)
3586 {
3587    int i = 0;
3588
3589 #ifdef STBI_SSE2
3590    // step == 3 is pretty ugly on the final interleave, and i'm not convinced
3591    // it's useful in practice (you wouldn't use it for textures, for example).
3592    // so just accelerate step == 4 case.
3593    if (step == 4) {
3594       // this is a fairly straightforward implementation and not super-optimized.
3595       __m128i signflip  = _mm_set1_epi8(-0x80);
3596       __m128i cr_const0 = _mm_set1_epi16(   (short) ( 1.40200f*4096.0f+0.5f));
3597       __m128i cr_const1 = _mm_set1_epi16( - (short) ( 0.71414f*4096.0f+0.5f));
3598       __m128i cb_const0 = _mm_set1_epi16( - (short) ( 0.34414f*4096.0f+0.5f));
3599       __m128i cb_const1 = _mm_set1_epi16(   (short) ( 1.77200f*4096.0f+0.5f));
3600       __m128i y_bias = _mm_set1_epi8((char) (unsigned char) 128);
3601       __m128i xw = _mm_set1_epi16(255); // alpha channel
3602
3603       for (; i+7 < count; i += 8) {
3604          // load
3605          __m128i y_bytes = _mm_loadl_epi64((__m128i *) (y+i));
3606          __m128i cr_bytes = _mm_loadl_epi64((__m128i *) (pcr+i));
3607          __m128i cb_bytes = _mm_loadl_epi64((__m128i *) (pcb+i));
3608          __m128i cr_biased = _mm_xor_si128(cr_bytes, signflip); // -128
3609          __m128i cb_biased = _mm_xor_si128(cb_bytes, signflip); // -128
3610
3611          // unpack to short (and left-shift cr, cb by 8)
3612          __m128i yw  = _mm_unpacklo_epi8(y_bias, y_bytes);
3613          __m128i crw = _mm_unpacklo_epi8(_mm_setzero_si128(), cr_biased);
3614          __m128i cbw = _mm_unpacklo_epi8(_mm_setzero_si128(), cb_biased);
3615
3616          // color transform
3617          __m128i yws = _mm_srli_epi16(yw, 4);
3618          __m128i cr0 = _mm_mulhi_epi16(cr_const0, crw);
3619          __m128i cb0 = _mm_mulhi_epi16(cb_const0, cbw);
3620          __m128i cb1 = _mm_mulhi_epi16(cbw, cb_const1);
3621          __m128i cr1 = _mm_mulhi_epi16(crw, cr_const1);
3622          __m128i rws = _mm_add_epi16(cr0, yws);
3623          __m128i gwt = _mm_add_epi16(cb0, yws);
3624          __m128i bws = _mm_add_epi16(yws, cb1);
3625          __m128i gws = _mm_add_epi16(gwt, cr1);
3626
3627          // descale
3628          __m128i rw = _mm_srai_epi16(rws, 4);
3629          __m128i bw = _mm_srai_epi16(bws, 4);
3630          __m128i gw = _mm_srai_epi16(gws, 4);
3631
3632          // back to byte, set up for transpose
3633          __m128i brb = _mm_packus_epi16(rw, bw);
3634          __m128i gxb = _mm_packus_epi16(gw, xw);
3635
3636          // transpose to interleave channels
3637          __m128i t0 = _mm_unpacklo_epi8(brb, gxb);
3638          __m128i t1 = _mm_unpackhi_epi8(brb, gxb);
3639          __m128i o0 = _mm_unpacklo_epi16(t0, t1);
3640          __m128i o1 = _mm_unpackhi_epi16(t0, t1);
3641
3642          // store
3643          _mm_storeu_si128((__m128i *) (out + 0), o0);
3644          _mm_storeu_si128((__m128i *) (out + 16), o1);
3645          out += 32;
3646       }
3647    }
3648 #endif
3649
3650 #ifdef STBI_NEON
3651    // in this version, step=3 support would be easy to add. but is there demand?
3652    if (step == 4) {
3653       // this is a fairly straightforward implementation and not super-optimized.
3654       uint8x8_t signflip = vdup_n_u8(0x80);
3655       int16x8_t cr_const0 = vdupq_n_s16(   (short) ( 1.40200f*4096.0f+0.5f));
3656       int16x8_t cr_const1 = vdupq_n_s16( - (short) ( 0.71414f*4096.0f+0.5f));
3657       int16x8_t cb_const0 = vdupq_n_s16( - (short) ( 0.34414f*4096.0f+0.5f));
3658       int16x8_t cb_const1 = vdupq_n_s16(   (short) ( 1.77200f*4096.0f+0.5f));
3659
3660       for (; i+7 < count; i += 8) {
3661          // load
3662          uint8x8_t y_bytes  = vld1_u8(y + i);
3663          uint8x8_t cr_bytes = vld1_u8(pcr + i);
3664          uint8x8_t cb_bytes = vld1_u8(pcb + i);
3665          int8x8_t cr_biased = vreinterpret_s8_u8(vsub_u8(cr_bytes, signflip));
3666          int8x8_t cb_biased = vreinterpret_s8_u8(vsub_u8(cb_bytes, signflip));
3667
3668          // expand to s16
3669          int16x8_t yws = vreinterpretq_s16_u16(vshll_n_u8(y_bytes, 4));
3670          int16x8_t crw = vshll_n_s8(cr_biased, 7);
3671          int16x8_t cbw = vshll_n_s8(cb_biased, 7);
3672
3673          // color transform
3674          int16x8_t cr0 = vqdmulhq_s16(crw, cr_const0);
3675          int16x8_t cb0 = vqdmulhq_s16(cbw, cb_const0);
3676          int16x8_t cr1 = vqdmulhq_s16(crw, cr_const1);
3677          int16x8_t cb1 = vqdmulhq_s16(cbw, cb_const1);
3678          int16x8_t rws = vaddq_s16(yws, cr0);
3679          int16x8_t gws = vaddq_s16(vaddq_s16(yws, cb0), cr1);
3680          int16x8_t bws = vaddq_s16(yws, cb1);
3681
3682          // undo scaling, round, convert to byte
3683          uint8x8x4_t o;
3684          o.val[0] = vqrshrun_n_s16(rws, 4);
3685          o.val[1] = vqrshrun_n_s16(gws, 4);
3686          o.val[2] = vqrshrun_n_s16(bws, 4);
3687          o.val[3] = vdup_n_u8(255);
3688
3689          // store, interleaving r/g/b/a
3690          vst4_u8(out, o);
3691          out += 8*4;
3692       }
3693    }
3694 #endif
3695
3696    for (; i < count; ++i) {
3697       int y_fixed = (y[i] << 20) + (1<<19); // rounding
3698       int r,g,b;
3699       int cr = pcr[i] - 128;
3700       int cb = pcb[i] - 128;
3701       r = y_fixed + cr* stbi__float2fixed(1.40200f);
3702       g = y_fixed + cr*-stbi__float2fixed(0.71414f) + ((cb*-stbi__float2fixed(0.34414f)) & 0xffff0000);
3703       b = y_fixed                                   +   cb* stbi__float2fixed(1.77200f);
3704       r >>= 20;
3705       g >>= 20;
3706       b >>= 20;
3707       if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
3708       if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
3709       if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
3710       out[0] = (stbi_uc)r;
3711       out[1] = (stbi_uc)g;
3712       out[2] = (stbi_uc)b;
3713       out[3] = 255;
3714       out += step;
3715    }
3716 }
3717 #endif
3718
3719 // set up the kernels
3720 static void stbi__setup_jpeg(stbi__jpeg *j)
3721 {
3722    j->idct_block_kernel = stbi__idct_block;
3723    j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_row;
3724    j->resample_row_hv_2_kernel = stbi__resample_row_hv_2;
3725
3726 #ifdef STBI_SSE2
3727    if (stbi__sse2_available()) {
3728       j->idct_block_kernel = stbi__idct_simd;
3729       j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
3730       j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
3731    }
3732 #endif
3733
3734 #ifdef STBI_NEON
3735    j->idct_block_kernel = stbi__idct_simd;
3736    j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
3737    j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
3738 #endif
3739 }
3740
3741 // clean up the temporary component buffers
3742 static void stbi__cleanup_jpeg(stbi__jpeg *j)
3743 {
3744    stbi__free_jpeg_components(j, j->s->img_n, 0);
3745 }
3746
3747 typedef struct
3748 {
3749    resample_row_func resample;
3750    stbi_uc *line0,*line1;
3751    int hs,vs;   // expansion factor in each axis
3752    int w_lores; // horizontal pixels pre-expansion
3753    int ystep;   // how far through vertical expansion we are
3754    int ypos;    // which pre-expansion row we're on
3755 } stbi__resample;
3756
3757 // fast 0..255 * 0..255 => 0..255 rounded multiplication
3758 static stbi_uc stbi__blinn_8x8(stbi_uc x, stbi_uc y)
3759 {
3760    unsigned int t = x*y + 128;
3761    return (stbi_uc) ((t + (t >>8)) >> 8);
3762 }
3763
3764 static stbi_uc *load_jpeg_image(stbi__jpeg *z, int *out_x, int *out_y, int *comp, int req_comp)
3765 {
3766    int n, decode_n, is_rgb;
3767    z->s->img_n = 0; // make stbi__cleanup_jpeg safe
3768
3769    // validate req_comp
3770    if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error");
3771
3772    // load a jpeg image from whichever source, but leave in YCbCr format
3773    if (!stbi__decode_jpeg_image(z)) { stbi__cleanup_jpeg(z); return NULL; }
3774
3775    // determine actual number of components to generate
3776    n = req_comp ? req_comp : z->s->img_n >= 3 ? 3 : 1;
3777
3778    is_rgb = z->s->img_n == 3 && (z->rgb == 3 || (z->app14_color_transform == 0 && !z->jfif));
3779
3780    if (z->s->img_n == 3 && n < 3 && !is_rgb)
3781       decode_n = 1;
3782    else
3783       decode_n = z->s->img_n;
3784
3785    // resample and color-convert
3786    {
3787       int k;
3788       unsigned int i,j;
3789       stbi_uc *output;
3790       stbi_uc *coutput[4] = { NULL, NULL, NULL, NULL };
3791
3792       stbi__resample res_comp[4];
3793
3794       for (k=0; k < decode_n; ++k) {
3795          stbi__resample *r = &res_comp[k];
3796
3797          // allocate line buffer big enough for upsampling off the edges
3798          // with upsample factor of 4
3799          z->img_comp[k].linebuf = (stbi_uc *) stbi__malloc(z->s->img_x + 3);
3800          if (!z->img_comp[k].linebuf) { stbi__cleanup_jpeg(z); return stbi__errpuc("outofmem", "Out of memory"); }
3801
3802          r->hs      = z->img_h_max / z->img_comp[k].h;
3803          r->vs      = z->img_v_max / z->img_comp[k].v;
3804          r->ystep   = r->vs >> 1;
3805          r->w_lores = (z->s->img_x + r->hs-1) / r->hs;
3806          r->ypos    = 0;
3807          r->line0   = r->line1 = z->img_comp[k].data;
3808
3809          if      (r->hs == 1 && r->vs == 1) r->resample = resample_row_1;
3810          else if (r->hs == 1 && r->vs == 2) r->resample = stbi__resample_row_v_2;
3811          else if (r->hs == 2 && r->vs == 1) r->resample = stbi__resample_row_h_2;
3812          else if (r->hs == 2 && r->vs == 2) r->resample = z->resample_row_hv_2_kernel;
3813          else                               r->resample = stbi__resample_row_generic;
3814       }
3815
3816       // can't error after this so, this is safe
3817       output = (stbi_uc *) stbi__malloc_mad3(n, z->s->img_x, z->s->img_y, 1);
3818       if (!output) { stbi__cleanup_jpeg(z); return stbi__errpuc("outofmem", "Out of memory"); }
3819
3820       // now go ahead and resample
3821       for (j=0; j < z->s->img_y; ++j) {
3822          stbi_uc *out = output + n * z->s->img_x * j;
3823          for (k=0; k < decode_n; ++k) {
3824             stbi__resample *r = &res_comp[k];
3825             int y_bot = r->ystep >= (r->vs >> 1);
3826             coutput[k] = r->resample(z->img_comp[k].linebuf,
3827                                      y_bot ? r->line1 : r->line0,
3828                                      y_bot ? r->line0 : r->line1,
3829                                      r->w_lores, r->hs);
3830             if (++r->ystep >= r->vs) {
3831                r->ystep = 0;
3832                r->line0 = r->line1;
3833                if (++r->ypos < z->img_comp[k].y)
3834                   r->line1 += z->img_comp[k].w2;
3835             }
3836          }
3837          if (n >= 3) {
3838             stbi_uc *y = coutput[0];
3839             if (z->s->img_n == 3) {
3840                if (is_rgb) {
3841                   for (i=0; i < z->s->img_x; ++i) {
3842                      out[0] = y[i];
3843                      out[1] = coutput[1][i];
3844                      out[2] = coutput[2][i];
3845                      out[3] = 255;
3846                      out += n;
3847                   }
3848                } else {
3849                   z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
3850                }
3851             } else if (z->s->img_n == 4) {
3852                if (z->app14_color_transform == 0) { // CMYK
3853                   for (i=0; i < z->s->img_x; ++i) {
3854                      stbi_uc m = coutput[3][i];
3855                      out[0] = stbi__blinn_8x8(coutput[0][i], m);
3856                      out[1] = stbi__blinn_8x8(coutput[1][i], m);
3857                      out[2] = stbi__blinn_8x8(coutput[2][i], m);
3858                      out[3] = 255;
3859                      out += n;
3860                   }
3861                } else if (z->app14_color_transform == 2) { // YCCK
3862                   z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
3863                   for (i=0; i < z->s->img_x; ++i) {
3864                      stbi_uc m = coutput[3][i];
3865                      out[0] = stbi__blinn_8x8(255 - out[0], m);
3866                      out[1] = stbi__blinn_8x8(255 - out[1], m);
3867                      out[2] = stbi__blinn_8x8(255 - out[2], m);
3868                      out += n;
3869                   }
3870                } else { // YCbCr + alpha?  Ignore the fourth channel for now
3871                   z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
3872                }
3873             } else
3874                for (i=0; i < z->s->img_x; ++i) {
3875                   out[0] = out[1] = out[2] = y[i];
3876                   out[3] = 255; // not used if n==3
3877                   out += n;
3878                }
3879          } else {
3880             if (is_rgb) {
3881                if (n == 1)
3882                   for (i=0; i < z->s->img_x; ++i)
3883                      *out++ = stbi__compute_y(coutput[0][i], coutput[1][i], coutput[2][i]);
3884                else {
3885                   for (i=0; i < z->s->img_x; ++i, out += 2) {
3886                      out[0] = stbi__compute_y(coutput[0][i], coutput[1][i], coutput[2][i]);
3887                      out[1] = 255;
3888                   }
3889                }
3890             } else if (z->s->img_n == 4 && z->app14_color_transform == 0) {
3891                for (i=0; i < z->s->img_x; ++i) {
3892                   stbi_uc m = coutput[3][i];
3893                   stbi_uc r = stbi__blinn_8x8(coutput[0][i], m);
3894                   stbi_uc g = stbi__blinn_8x8(coutput[1][i], m);
3895                   stbi_uc b = stbi__blinn_8x8(coutput[2][i], m);
3896                   out[0] = stbi__compute_y(r, g, b);
3897                   out[1] = 255;
3898                   out += n;
3899                }
3900             } else if (z->s->img_n == 4 && z->app14_color_transform == 2) {
3901                for (i=0; i < z->s->img_x; ++i) {
3902                   out[0] = stbi__blinn_8x8(255 - coutput[0][i], coutput[3][i]);
3903                   out[1] = 255;
3904                   out += n;
3905                }
3906             } else {
3907                stbi_uc *y = coutput[0];
3908                if (n == 1)
3909                   for (i=0; i < z->s->img_x; ++i) out[i] = y[i];
3910                else
3911                   for (i=0; i < z->s->img_x; ++i) { *out++ = y[i]; *out++ = 255; }
3912             }
3913          }
3914       }
3915       stbi__cleanup_jpeg(z);
3916       *out_x = z->s->img_x;
3917       *out_y = z->s->img_y;
3918       if (comp) *comp = z->s->img_n >= 3 ? 3 : 1; // report original components, not output
3919       return output;
3920    }
3921 }
3922
3923 static void *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
3924 {
3925    unsigned char* result;
3926    stbi__jpeg* j = (stbi__jpeg*) stbi__malloc(sizeof(stbi__jpeg));
3927    STBI_NOTUSED(ri);
3928    j->s = s;
3929    stbi__setup_jpeg(j);
3930    result = load_jpeg_image(j, x,y,comp,req_comp);
3931    STBI_FREE(j);
3932    return result;
3933 }
3934
3935 static int stbi__jpeg_test(stbi__context *s)
3936 {
3937    int r;
3938    stbi__jpeg* j = (stbi__jpeg*)stbi__malloc(sizeof(stbi__jpeg));
3939    j->s = s;
3940    stbi__setup_jpeg(j);
3941    r = stbi__decode_jpeg_header(j, STBI__SCAN_type);
3942    stbi__rewind(s);
3943    STBI_FREE(j);
3944    return r;
3945 }
3946
3947 static int stbi__jpeg_info_raw(stbi__jpeg *j, int *x, int *y, int *comp)
3948 {
3949    if (!stbi__decode_jpeg_header(j, STBI__SCAN_header)) {
3950       stbi__rewind( j->s );
3951       return 0;
3952    }
3953    if (x) *x = j->s->img_x;
3954    if (y) *y = j->s->img_y;
3955    if (comp) *comp = j->s->img_n >= 3 ? 3 : 1;
3956    return 1;
3957 }
3958
3959 static int stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp)
3960 {
3961    int result;
3962    stbi__jpeg* j = (stbi__jpeg*) (stbi__malloc(sizeof(stbi__jpeg)));
3963    j->s = s;
3964    result = stbi__jpeg_info_raw(j, x, y, comp);
3965    STBI_FREE(j);
3966    return result;
3967 }
3968 #endif
3969
3970 // public domain zlib decode    v0.2  Sean Barrett 2006-11-18
3971 //    simple implementation
3972 //      - all input must be provided in an upfront buffer
3973 //      - all output is written to a single output buffer (can malloc/realloc)
3974 //    performance
3975 //      - fast huffman
3976
3977 #ifndef STBI_NO_ZLIB
3978
3979 // fast-way is faster to check than jpeg huffman, but slow way is slower
3980 #define STBI__ZFAST_BITS  9 // accelerate all cases in default tables
3981 #define STBI__ZFAST_MASK  ((1 << STBI__ZFAST_BITS) - 1)
3982
3983 // zlib-style huffman encoding
3984 // (jpegs packs from left, zlib from right, so can't share code)
3985 typedef struct
3986 {
3987    stbi__uint16 fast[1 << STBI__ZFAST_BITS];
3988    stbi__uint16 firstcode[16];
3989    int maxcode[17];
3990    stbi__uint16 firstsymbol[16];
3991    stbi_uc  size[288];
3992    stbi__uint16 value[288];
3993 } stbi__zhuffman;
3994
3995 stbi_inline static int stbi__bitreverse16(int n)
3996 {
3997   n = ((n & 0xAAAA) >>  1) | ((n & 0x5555) << 1);
3998   n = ((n & 0xCCCC) >>  2) | ((n & 0x3333) << 2);
3999   n = ((n & 0xF0F0) >>  4) | ((n & 0x0F0F) << 4);
4000   n = ((n & 0xFF00) >>  8) | ((n & 0x00FF) << 8);
4001   return n;
4002 }
4003
4004 stbi_inline static int stbi__bit_reverse(int v, int bits)
4005 {
4006    STBI_ASSERT(bits <= 16);
4007    // to bit reverse n bits, reverse 16 and shift
4008    // e.g. 11 bits, bit reverse and shift away 5
4009    return stbi__bitreverse16(v) >> (16-bits);
4010 }
4011
4012 static int stbi__zbuild_huffman(stbi__zhuffman *z, const stbi_uc *sizelist, int num)
4013 {
4014    int i,k=0;
4015    int code, next_code[16], sizes[17];
4016
4017    // DEFLATE spec for generating codes
4018    memset(sizes, 0, sizeof(sizes));
4019    memset(z->fast, 0, sizeof(z->fast));
4020    for (i=0; i < num; ++i)
4021       ++sizes[sizelist[i]];
4022    sizes[0] = 0;
4023    for (i=1; i < 16; ++i)
4024       if (sizes[i] > (1 << i))
4025          return stbi__err("bad sizes", "Corrupt PNG");
4026    code = 0;
4027    for (i=1; i < 16; ++i) {
4028       next_code[i] = code;
4029       z->firstcode[i] = (stbi__uint16) code;
4030       z->firstsymbol[i] = (stbi__uint16) k;
4031       code = (code + sizes[i]);
4032       if (sizes[i])
4033          if (code-1 >= (1 << i)) return stbi__err("bad codelengths","Corrupt PNG");
4034       z->maxcode[i] = code << (16-i); // preshift for inner loop
4035       code <<= 1;
4036       k += sizes[i];
4037    }
4038    z->maxcode[16] = 0x10000; // sentinel
4039    for (i=0; i < num; ++i) {
4040       int s = sizelist[i];
4041       if (s) {
4042          int c = next_code[s] - z->firstcode[s] + z->firstsymbol[s];
4043          stbi__uint16 fastv = (stbi__uint16) ((s << 9) | i);
4044          z->size [c] = (stbi_uc     ) s;
4045          z->value[c] = (stbi__uint16) i;
4046          if (s <= STBI__ZFAST_BITS) {
4047             int j = stbi__bit_reverse(next_code[s],s);
4048             while (j < (1 << STBI__ZFAST_BITS)) {
4049                z->fast[j] = fastv;
4050                j += (1 << s);
4051             }
4052          }
4053          ++next_code[s];
4054       }
4055    }
4056    return 1;
4057 }
4058
4059 // zlib-from-memory implementation for PNG reading
4060 //    because PNG allows splitting the zlib stream arbitrarily,
4061 //    and it's annoying structurally to have PNG call ZLIB call PNG,
4062 //    we require PNG read all the IDATs and combine them into a single
4063 //    memory buffer
4064
4065 typedef struct
4066 {
4067    stbi_uc *zbuffer, *zbuffer_end;
4068    int num_bits;
4069    stbi__uint32 code_buffer;
4070
4071    char *zout;
4072    char *zout_start;
4073    char *zout_end;
4074    int   z_expandable;
4075
4076    stbi__zhuffman z_length, z_distance;
4077 } stbi__zbuf;
4078
4079 stbi_inline static int stbi__zeof(stbi__zbuf *z)
4080 {
4081    return (z->zbuffer >= z->zbuffer_end);
4082 }
4083
4084 stbi_inline static stbi_uc stbi__zget8(stbi__zbuf *z)
4085 {
4086    return stbi__zeof(z) ? 0 : *z->zbuffer++;
4087 }
4088
4089 static void stbi__fill_bits(stbi__zbuf *z)
4090 {
4091    do {
4092       if (z->code_buffer >= (1U << z->num_bits)) {
4093         z->zbuffer = z->zbuffer_end;  /* treat this as EOF so we fail. */
4094         return;
4095       }
4096       z->code_buffer |= (unsigned int) stbi__zget8(z) << z->num_bits;
4097       z->num_bits += 8;
4098    } while (z->num_bits <= 24);
4099 }
4100
4101 stbi_inline static unsigned int stbi__zreceive(stbi__zbuf *z, int n)
4102 {
4103    unsigned int k;
4104    if (z->num_bits < n) stbi__fill_bits(z);
4105    k = z->code_buffer & ((1 << n) - 1);
4106    z->code_buffer >>= n;
4107    z->num_bits -= n;
4108    return k;
4109 }
4110
4111 static int stbi__zhuffman_decode_slowpath(stbi__zbuf *a, stbi__zhuffman *z)
4112 {
4113    int b,s,k;
4114    // not resolved by fast table, so compute it the slow way
4115    // use jpeg approach, which requires MSbits at top
4116    k = stbi__bit_reverse(a->code_buffer, 16);
4117    for (s=STBI__ZFAST_BITS+1; ; ++s)
4118       if (k < z->maxcode[s])
4119          break;
4120    if (s >= 16) return -1; // invalid code!
4121    // code size is s, so:
4122    b = (k >> (16-s)) - z->firstcode[s] + z->firstsymbol[s];
4123    if (b >= sizeof (z->size)) return -1; // some data was corrupt somewhere!
4124    if (z->size[b] != s) return -1;  // was originally an assert, but report failure instead.
4125    a->code_buffer >>= s;
4126    a->num_bits -= s;
4127    return z->value[b];
4128 }
4129
4130 stbi_inline static int stbi__zhuffman_decode(stbi__zbuf *a, stbi__zhuffman *z)
4131 {
4132    int b,s;
4133    if (a->num_bits < 16) {
4134       if (stbi__zeof(a)) {
4135          return -1;   /* report error for unexpected end of data. */
4136       }
4137       stbi__fill_bits(a);
4138    }
4139    b = z->fast[a->code_buffer & STBI__ZFAST_MASK];
4140    if (b) {
4141       s = b >> 9;
4142       a->code_buffer >>= s;
4143       a->num_bits -= s;
4144       return b & 511;
4145    }
4146    return stbi__zhuffman_decode_slowpath(a, z);
4147 }
4148
4149 static int stbi__zexpand(stbi__zbuf *z, char *zout, int n)  // need to make room for n bytes
4150 {
4151    char *q;
4152    unsigned int cur, limit, old_limit;
4153    z->zout = zout;
4154    if (!z->z_expandable) return stbi__err("output buffer limit","Corrupt PNG");
4155    cur   = (unsigned int) (z->zout - z->zout_start);
4156    limit = old_limit = (unsigned) (z->zout_end - z->zout_start);
4157    if (UINT_MAX - cur < (unsigned) n) return stbi__err("outofmem", "Out of memory");
4158    while (cur + n > limit) {
4159       if(limit > UINT_MAX / 2) return stbi__err("outofmem", "Out of memory");
4160       limit *= 2;
4161    }
4162    q = (char *) STBI_REALLOC_SIZED(z->zout_start, old_limit, limit);
4163    STBI_NOTUSED(old_limit);
4164    if (q == NULL) return stbi__err("outofmem", "Out of memory");
4165    z->zout_start = q;
4166    z->zout       = q + cur;
4167    z->zout_end   = q + limit;
4168    return 1;
4169 }
4170
4171 static const int stbi__zlength_base[31] = {
4172    3,4,5,6,7,8,9,10,11,13,
4173    15,17,19,23,27,31,35,43,51,59,
4174    67,83,99,115,131,163,195,227,258,0,0 };
4175
4176 static const int stbi__zlength_extra[31]=
4177 { 0,0,0,0,0,0,0,0,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5,0,0,0 };
4178
4179 static const int stbi__zdist_base[32] = { 1,2,3,4,5,7,9,13,17,25,33,49,65,97,129,193,
4180 257,385,513,769,1025,1537,2049,3073,4097,6145,8193,12289,16385,24577,0,0};
4181
4182 static const int stbi__zdist_extra[32] =
4183 { 0,0,0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13};
4184
4185 static int stbi__parse_huffman_block(stbi__zbuf *a)
4186 {
4187    char *zout = a->zout;
4188    for(;;) {
4189       int z = stbi__zhuffman_decode(a, &a->z_length);
4190       if (z < 256) {
4191          if (z < 0) return stbi__err("bad huffman code","Corrupt PNG"); // error in huffman codes
4192          if (zout >= a->zout_end) {
4193             if (!stbi__zexpand(a, zout, 1)) return 0;
4194             zout = a->zout;
4195          }
4196          *zout++ = (char) z;
4197       } else {
4198          stbi_uc *p;
4199          int len,dist;
4200          if (z == 256) {
4201             a->zout = zout;
4202             return 1;
4203          }
4204          z -= 257;
4205          len = stbi__zlength_base[z];
4206          if (stbi__zlength_extra[z]) len += stbi__zreceive(a, stbi__zlength_extra[z]);
4207          z = stbi__zhuffman_decode(a, &a->z_distance);
4208          if (z < 0) return stbi__err("bad huffman code","Corrupt PNG");
4209          dist = stbi__zdist_base[z];
4210          if (stbi__zdist_extra[z]) dist += stbi__zreceive(a, stbi__zdist_extra[z]);
4211          if (zout - a->zout_start < dist) return stbi__err("bad dist","Corrupt PNG");
4212          if (zout + len > a->zout_end) {
4213             if (!stbi__zexpand(a, zout, len)) return 0;
4214             zout = a->zout;
4215          }
4216          p = (stbi_uc *) (zout - dist);
4217          if (dist == 1) { // run of one byte; common in images.
4218             stbi_uc v = *p;
4219             if (len) { do *zout++ = v; while (--len); }
4220          } else {
4221             if (len) { do *zout++ = *p++; while (--len); }
4222          }
4223       }
4224    }
4225 }
4226
4227 static int stbi__compute_huffman_codes(stbi__zbuf *a)
4228 {
4229    static const stbi_uc length_dezigzag[19] = { 16,17,18,0,8,7,9,6,10,5,11,4,12,3,13,2,14,1,15 };
4230    stbi__zhuffman z_codelength;
4231    stbi_uc lencodes[286+32+137];//padding for maximum single op
4232    stbi_uc codelength_sizes[19];
4233    int i,n;
4234
4235    int hlit  = stbi__zreceive(a,5) + 257;
4236    int hdist = stbi__zreceive(a,5) + 1;
4237    int hclen = stbi__zreceive(a,4) + 4;
4238    int ntot  = hlit + hdist;
4239
4240    memset(codelength_sizes, 0, sizeof(codelength_sizes));
4241    for (i=0; i < hclen; ++i) {
4242       int s = stbi__zreceive(a,3);
4243       codelength_sizes[length_dezigzag[i]] = (stbi_uc) s;
4244    }
4245    if (!stbi__zbuild_huffman(&z_codelength, codelength_sizes, 19)) return 0;
4246
4247    n = 0;
4248    while (n < ntot) {
4249       int c = stbi__zhuffman_decode(a, &z_codelength);
4250       if (c < 0 || c >= 19) return stbi__err("bad codelengths", "Corrupt PNG");
4251       if (c < 16)
4252          lencodes[n++] = (stbi_uc) c;
4253       else {
4254          stbi_uc fill = 0;
4255          if (c == 16) {
4256             c = stbi__zreceive(a,2)+3;
4257             if (n == 0) return stbi__err("bad codelengths", "Corrupt PNG");
4258             fill = lencodes[n-1];
4259          } else if (c == 17) {
4260             c = stbi__zreceive(a,3)+3;
4261          } else if (c == 18) {
4262             c = stbi__zreceive(a,7)+11;
4263          } else {
4264             return stbi__err("bad codelengths", "Corrupt PNG");
4265          }
4266          if (ntot - n < c) return stbi__err("bad codelengths", "Corrupt PNG");
4267          memset(lencodes+n, fill, c);
4268          n += c;
4269       }
4270    }
4271    if (n != ntot) return stbi__err("bad codelengths","Corrupt PNG");
4272    if (!stbi__zbuild_huffman(&a->z_length, lencodes, hlit)) return 0;
4273    if (!stbi__zbuild_huffman(&a->z_distance, lencodes+hlit, hdist)) return 0;
4274    return 1;
4275 }
4276
4277 static int stbi__parse_uncompressed_block(stbi__zbuf *a)
4278 {
4279    stbi_uc header[4];
4280    int len,nlen,k;
4281    if (a->num_bits & 7)
4282       stbi__zreceive(a, a->num_bits & 7); // discard
4283    // drain the bit-packed data into header
4284    k = 0;
4285    while (a->num_bits > 0) {
4286       header[k++] = (stbi_uc) (a->code_buffer & 255); // suppress MSVC run-time check
4287       a->code_buffer >>= 8;
4288       a->num_bits -= 8;
4289    }
4290    if (a->num_bits < 0) return stbi__err("zlib corrupt","Corrupt PNG");
4291    // now fill header the normal way
4292    while (k < 4)
4293       header[k++] = stbi__zget8(a);
4294    len  = header[1] * 256 + header[0];
4295    nlen = header[3] * 256 + header[2];
4296    if (nlen != (len ^ 0xffff)) return stbi__err("zlib corrupt","Corrupt PNG");
4297    if (a->zbuffer + len > a->zbuffer_end) return stbi__err("read past buffer","Corrupt PNG");
4298    if (a->zout + len > a->zout_end)
4299       if (!stbi__zexpand(a, a->zout, len)) return 0;
4300    memcpy(a->zout, a->zbuffer, len);
4301    a->zbuffer += len;
4302    a->zout += len;
4303    return 1;
4304 }
4305
4306 static int stbi__parse_zlib_header(stbi__zbuf *a)
4307 {
4308    int cmf   = stbi__zget8(a);
4309    int cm    = cmf & 15;
4310    /* int cinfo = cmf >> 4; */
4311    int flg   = stbi__zget8(a);
4312    if (stbi__zeof(a)) return stbi__err("bad zlib header","Corrupt PNG"); // zlib spec
4313    if ((cmf*256+flg) % 31 != 0) return stbi__err("bad zlib header","Corrupt PNG"); // zlib spec
4314    if (flg & 32) return stbi__err("no preset dict","Corrupt PNG"); // preset dictionary not allowed in png
4315    if (cm != 8) return stbi__err("bad compression","Corrupt PNG"); // DEFLATE required for png
4316    // window = 1 << (8 + cinfo)... but who cares, we fully buffer output
4317    return 1;
4318 }
4319
4320 static const stbi_uc stbi__zdefault_length[288] =
4321 {
4322    8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
4323    8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
4324    8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
4325    8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
4326    8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
4327    9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
4328    9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
4329    9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
4330    7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7, 7,7,7,7,7,7,7,7,8,8,8,8,8,8,8,8
4331 };
4332 static const stbi_uc stbi__zdefault_distance[32] =
4333 {
4334    5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5
4335 };
4336 /*
4337 Init algorithm:
4338 {
4339    int i;   // use <= to match clearly with spec
4340    for (i=0; i <= 143; ++i)     stbi__zdefault_length[i]   = 8;
4341    for (   ; i <= 255; ++i)     stbi__zdefault_length[i]   = 9;
4342    for (   ; i <= 279; ++i)     stbi__zdefault_length[i]   = 7;
4343    for (   ; i <= 287; ++i)     stbi__zdefault_length[i]   = 8;
4344
4345    for (i=0; i <=  31; ++i)     stbi__zdefault_distance[i] = 5;
4346 }
4347 */
4348
4349 static int stbi__parse_zlib(stbi__zbuf *a, int parse_header)
4350 {
4351    int final, type;
4352    if (parse_header)
4353       if (!stbi__parse_zlib_header(a)) return 0;
4354    a->num_bits = 0;
4355    a->code_buffer = 0;
4356    do {
4357       final = stbi__zreceive(a,1);
4358       type = stbi__zreceive(a,2);
4359       if (type == 0) {
4360          if (!stbi__parse_uncompressed_block(a)) return 0;
4361       } else if (type == 3) {
4362          return 0;
4363       } else {
4364          if (type == 1) {
4365             // use fixed code lengths
4366             if (!stbi__zbuild_huffman(&a->z_length  , stbi__zdefault_length  , 288)) return 0;
4367             if (!stbi__zbuild_huffman(&a->z_distance, stbi__zdefault_distance,  32)) return 0;
4368          } else {
4369             if (!stbi__compute_huffman_codes(a)) return 0;
4370          }
4371          if (!stbi__parse_huffman_block(a)) return 0;
4372       }
4373    } while (!final);
4374    return 1;
4375 }
4376
4377 static int stbi__do_zlib(stbi__zbuf *a, char *obuf, int olen, int exp, int parse_header)
4378 {
4379    a->zout_start = obuf;
4380    a->zout       = obuf;
4381    a->zout_end   = obuf + olen;
4382    a->z_expandable = exp;
4383
4384    return stbi__parse_zlib(a, parse_header);
4385 }
4386
4387 STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen)
4388 {
4389    stbi__zbuf a;
4390    char *p = (char *) stbi__malloc(initial_size);
4391    if (p == NULL) return NULL;
4392    a.zbuffer = (stbi_uc *) buffer;
4393    a.zbuffer_end = (stbi_uc *) buffer + len;
4394    if (stbi__do_zlib(&a, p, initial_size, 1, 1)) {
4395       if (outlen) *outlen = (int) (a.zout - a.zout_start);
4396       return a.zout_start;
4397    } else {
4398       STBI_FREE(a.zout_start);
4399       return NULL;
4400    }
4401 }
4402
4403 STBIDEF char *stbi_zlib_decode_malloc(char const *buffer, int len, int *outlen)
4404 {
4405    return stbi_zlib_decode_malloc_guesssize(buffer, len, 16384, outlen);
4406 }
4407
4408 STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header)
4409 {
4410    stbi__zbuf a;
4411    char *p = (char *) stbi__malloc(initial_size);
4412    if (p == NULL) return NULL;
4413    a.zbuffer = (stbi_uc *) buffer;
4414    a.zbuffer_end = (stbi_uc *) buffer + len;
4415    if (stbi__do_zlib(&a, p, initial_size, 1, parse_header)) {
4416       if (outlen) *outlen = (int) (a.zout - a.zout_start);
4417       return a.zout_start;
4418    } else {
4419       STBI_FREE(a.zout_start);
4420       return NULL;
4421    }
4422 }
4423
4424 STBIDEF int stbi_zlib_decode_buffer(char *obuffer, int olen, char const *ibuffer, int ilen)
4425 {
4426    stbi__zbuf a;
4427    a.zbuffer = (stbi_uc *) ibuffer;
4428    a.zbuffer_end = (stbi_uc *) ibuffer + ilen;
4429    if (stbi__do_zlib(&a, obuffer, olen, 0, 1))
4430       return (int) (a.zout - a.zout_start);
4431    else
4432       return -1;
4433 }
4434
4435 STBIDEF char *stbi_zlib_decode_noheader_malloc(char const *buffer, int len, int *outlen)
4436 {
4437    stbi__zbuf a;
4438    char *p = (char *) stbi__malloc(16384);
4439    if (p == NULL) return NULL;
4440    a.zbuffer = (stbi_uc *) buffer;
4441    a.zbuffer_end = (stbi_uc *) buffer+len;
4442    if (stbi__do_zlib(&a, p, 16384, 1, 0)) {
4443       if (outlen) *outlen = (int) (a.zout - a.zout_start);
4444       return a.zout_start;
4445    } else {
4446       STBI_FREE(a.zout_start);
4447       return NULL;
4448    }
4449 }
4450
4451 STBIDEF int stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen)
4452 {
4453    stbi__zbuf a;
4454    a.zbuffer = (stbi_uc *) ibuffer;
4455    a.zbuffer_end = (stbi_uc *) ibuffer + ilen;
4456    if (stbi__do_zlib(&a, obuffer, olen, 0, 0))
4457       return (int) (a.zout - a.zout_start);
4458    else
4459       return -1;
4460 }
4461 #endif
4462
4463 // public domain "baseline" PNG decoder   v0.10  Sean Barrett 2006-11-18
4464 //    simple implementation
4465 //      - only 8-bit samples
4466 //      - no CRC checking
4467 //      - allocates lots of intermediate memory
4468 //        - avoids problem of streaming data between subsystems
4469 //        - avoids explicit window management
4470 //    performance
4471 //      - uses stb_zlib, a PD zlib implementation with fast huffman decoding
4472
4473 #ifndef STBI_NO_PNG
4474 typedef struct
4475 {
4476    stbi__uint32 length;
4477    stbi__uint32 type;
4478 } stbi__pngchunk;
4479
4480 static stbi__pngchunk stbi__get_chunk_header(stbi__context *s)
4481 {
4482    stbi__pngchunk c;
4483    c.length = stbi__get32be(s);
4484    c.type   = stbi__get32be(s);
4485    return c;
4486 }
4487
4488 static int stbi__check_png_header(stbi__context *s)
4489 {
4490    static const stbi_uc png_sig[8] = { 137,80,78,71,13,10,26,10 };
4491    int i;
4492    for (i=0; i < 8; ++i)
4493       if (stbi__get8(s) != png_sig[i]) return stbi__err("bad png sig","Not a PNG");
4494    return 1;
4495 }
4496
4497 typedef struct
4498 {
4499    stbi__context *s;
4500    stbi_uc *idata, *expanded, *out;
4501    int depth;
4502 } stbi__png;
4503
4504
4505 enum {
4506    STBI__F_none=0,
4507    STBI__F_sub=1,
4508    STBI__F_up=2,
4509    STBI__F_avg=3,
4510    STBI__F_paeth=4,
4511    // synthetic filters used for first scanline to avoid needing a dummy row of 0s
4512    STBI__F_avg_first,
4513    STBI__F_paeth_first
4514 };
4515
4516 static stbi_uc first_row_filter[5] =
4517 {
4518    STBI__F_none,
4519    STBI__F_sub,
4520    STBI__F_none,
4521    STBI__F_avg_first,
4522    STBI__F_paeth_first
4523 };
4524
4525 static int stbi__paeth(int a, int b, int c)
4526 {
4527    int p = a + b - c;
4528    int pa = abs(p-a);
4529    int pb = abs(p-b);
4530    int pc = abs(p-c);
4531    if (pa <= pb && pa <= pc) return a;
4532    if (pb <= pc) return b;
4533    return c;
4534 }
4535
4536 static const stbi_uc stbi__depth_scale_table[9] = { 0, 0xff, 0x55, 0, 0x11, 0,0,0, 0x01 };
4537
4538 // create the png data from post-deflated data
4539 static int stbi__create_png_image_raw(stbi__png *a, stbi_uc *raw, stbi__uint32 raw_len, int out_n, stbi__uint32 x, stbi__uint32 y, int depth, int color)
4540 {
4541    int bytes = (depth == 16? 2 : 1);
4542    stbi__context *s = a->s;
4543    stbi__uint32 i,j,stride = x*out_n*bytes;
4544    stbi__uint32 img_len, img_width_bytes;
4545    int k;
4546    int img_n = s->img_n; // copy it into a local for later
4547
4548    int output_bytes = out_n*bytes;
4549    int filter_bytes = img_n*bytes;
4550    int width = x;
4551
4552    STBI_ASSERT(out_n == s->img_n || out_n == s->img_n+1);
4553    a->out = (stbi_uc *) stbi__malloc_mad3(x, y, output_bytes, 0); // extra bytes to write off the end into
4554    if (!a->out) return stbi__err("outofmem", "Out of memory");
4555
4556    if (!stbi__mad3sizes_valid(img_n, x, depth, 7)) return stbi__err("too large", "Corrupt PNG");
4557    img_width_bytes = (((img_n * x * depth) + 7) >> 3);
4558    img_len = (img_width_bytes + 1) * y;
4559
4560    // we used to check for exact match between raw_len and img_len on non-interlaced PNGs,
4561    // but issue #276 reported a PNG in the wild that had extra data at the end (all zeros),
4562    // so just check for raw_len < img_len always.
4563    if (raw_len < img_len) return stbi__err("not enough pixels","Corrupt PNG");
4564
4565    for (j=0; j < y; ++j) {
4566       stbi_uc *cur = a->out + stride*j;
4567       stbi_uc *prior;
4568       int filter = *raw++;
4569
4570       if (filter > 4)
4571          return stbi__err("invalid filter","Corrupt PNG");
4572
4573       if (depth < 8) {
4574          if (img_width_bytes > x) return stbi__err("invalid width","Corrupt PNG");
4575          cur += x*out_n - img_width_bytes; // store output to the rightmost img_len bytes, so we can decode in place
4576          filter_bytes = 1;
4577          width = img_width_bytes;
4578       }
4579       prior = cur - stride; // bugfix: need to compute this after 'cur +=' computation above
4580
4581       // if first row, use special filter that doesn't sample previous row
4582       if (j == 0) filter = first_row_filter[filter];
4583
4584       // handle first byte explicitly
4585       for (k=0; k < filter_bytes; ++k) {
4586          switch (filter) {
4587             case STBI__F_none       : cur[k] = raw[k]; break;
4588             case STBI__F_sub        : cur[k] = raw[k]; break;
4589             case STBI__F_up         : cur[k] = STBI__BYTECAST(raw[k] + prior[k]); break;
4590             case STBI__F_avg        : cur[k] = STBI__BYTECAST(raw[k] + (prior[k]>>1)); break;
4591             case STBI__F_paeth      : cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(0,prior[k],0)); break;
4592             case STBI__F_avg_first  : cur[k] = raw[k]; break;
4593             case STBI__F_paeth_first: cur[k] = raw[k]; break;
4594          }
4595       }
4596
4597       if (depth == 8) {
4598          if (img_n != out_n)
4599             cur[img_n] = 255; // first pixel
4600          raw += img_n;
4601          cur += out_n;
4602          prior += out_n;
4603       } else if (depth == 16) {
4604          if (img_n != out_n) {
4605             cur[filter_bytes]   = 255; // first pixel top byte
4606             cur[filter_bytes+1] = 255; // first pixel bottom byte
4607          }
4608          raw += filter_bytes;
4609          cur += output_bytes;
4610          prior += output_bytes;
4611       } else {
4612          raw += 1;
4613          cur += 1;
4614          prior += 1;
4615       }
4616
4617       // this is a little gross, so that we don't switch per-pixel or per-component
4618       if (depth < 8 || img_n == out_n) {
4619          int nk = (width - 1)*filter_bytes;
4620          #define STBI__CASE(f) \
4621              case f:     \
4622                 for (k=0; k < nk; ++k)
4623          switch (filter) {
4624             // "none" filter turns into a memcpy here; make that explicit.
4625             case STBI__F_none:         memcpy(cur, raw, nk); break;
4626             STBI__CASE(STBI__F_sub)          { cur[k] = STBI__BYTECAST(raw[k] + cur[k-filter_bytes]); } break;
4627             STBI__CASE(STBI__F_up)           { cur[k] = STBI__BYTECAST(raw[k] + prior[k]); } break;
4628             STBI__CASE(STBI__F_avg)          { cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k-filter_bytes])>>1)); } break;
4629             STBI__CASE(STBI__F_paeth)        { cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-filter_bytes],prior[k],prior[k-filter_bytes])); } break;
4630             STBI__CASE(STBI__F_avg_first)    { cur[k] = STBI__BYTECAST(raw[k] + (cur[k-filter_bytes] >> 1)); } break;
4631             STBI__CASE(STBI__F_paeth_first)  { cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-filter_bytes],0,0)); } break;
4632          }
4633          #undef STBI__CASE
4634          raw += nk;
4635       } else {
4636          STBI_ASSERT(img_n+1 == out_n);
4637          #define STBI__CASE(f) \
4638              case f:     \
4639                 for (i=x-1; i >= 1; --i, cur[filter_bytes]=255,raw+=filter_bytes,cur+=output_bytes,prior+=output_bytes) \
4640                    for (k=0; k < filter_bytes; ++k)
4641          switch (filter) {
4642             STBI__CASE(STBI__F_none)         { cur[k] = raw[k]; } break;
4643             STBI__CASE(STBI__F_sub)          { cur[k] = STBI__BYTECAST(raw[k] + cur[k- output_bytes]); } break;
4644             STBI__CASE(STBI__F_up)           { cur[k] = STBI__BYTECAST(raw[k] + prior[k]); } break;
4645             STBI__CASE(STBI__F_avg)          { cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k- output_bytes])>>1)); } break;
4646             STBI__CASE(STBI__F_paeth)        { cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k- output_bytes],prior[k],prior[k- output_bytes])); } break;
4647             STBI__CASE(STBI__F_avg_first)    { cur[k] = STBI__BYTECAST(raw[k] + (cur[k- output_bytes] >> 1)); } break;
4648             STBI__CASE(STBI__F_paeth_first)  { cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k- output_bytes],0,0)); } break;
4649          }
4650          #undef STBI__CASE
4651
4652          // the loop above sets the high byte of the pixels' alpha, but for
4653          // 16 bit png files we also need the low byte set. we'll do that here.
4654          if (depth == 16) {
4655             cur = a->out + stride*j; // start at the beginning of the row again
4656             for (i=0; i < x; ++i,cur+=output_bytes) {
4657                cur[filter_bytes+1] = 255;
4658             }
4659          }
4660       }
4661    }
4662
4663    // we make a separate pass to expand bits to pixels; for performance,
4664    // this could run two scanlines behind the above code, so it won't
4665    // intefere with filtering but will still be in the cache.
4666    if (depth < 8) {
4667       for (j=0; j < y; ++j) {
4668          stbi_uc *cur = a->out + stride*j;
4669          stbi_uc *in  = a->out + stride*j + x*out_n - img_width_bytes;
4670          // unpack 1/2/4-bit into a 8-bit buffer. allows us to keep the common 8-bit path optimal at minimal cost for 1/2/4-bit
4671          // png guarante byte alignment, if width is not multiple of 8/4/2 we'll decode dummy trailing data that will be skipped in the later loop
4672          stbi_uc scale = (color == 0) ? stbi__depth_scale_table[depth] : 1; // scale grayscale values to 0..255 range
4673
4674          // note that the final byte might overshoot and write more data than desired.
4675          // we can allocate enough data that this never writes out of memory, but it
4676          // could also overwrite the next scanline. can it overwrite non-empty data
4677          // on the next scanline? yes, consider 1-pixel-wide scanlines with 1-bit-per-pixel.
4678          // so we need to explicitly clamp the final ones
4679
4680          if (depth == 4) {
4681             for (k=x*img_n; k >= 2; k-=2, ++in) {
4682                *cur++ = scale * ((*in >> 4)       );
4683                *cur++ = scale * ((*in     ) & 0x0f);
4684             }
4685             if (k > 0) *cur++ = scale * ((*in >> 4)       );
4686          } else if (depth == 2) {
4687             for (k=x*img_n; k >= 4; k-=4, ++in) {
4688                *cur++ = scale * ((*in >> 6)       );
4689                *cur++ = scale * ((*in >> 4) & 0x03);
4690                *cur++ = scale * ((*in >> 2) & 0x03);
4691                *cur++ = scale * ((*in     ) & 0x03);
4692             }
4693             if (k > 0) *cur++ = scale * ((*in >> 6)       );
4694             if (k > 1) *cur++ = scale * ((*in >> 4) & 0x03);
4695             if (k > 2) *cur++ = scale * ((*in >> 2) & 0x03);
4696          } else if (depth == 1) {
4697             for (k=x*img_n; k >= 8; k-=8, ++in) {
4698                *cur++ = scale * ((*in >> 7)       );
4699                *cur++ = scale * ((*in >> 6) & 0x01);
4700                *cur++ = scale * ((*in >> 5) & 0x01);
4701                *cur++ = scale * ((*in >> 4) & 0x01);
4702                *cur++ = scale * ((*in >> 3) & 0x01);
4703                *cur++ = scale * ((*in >> 2) & 0x01);
4704                *cur++ = scale * ((*in >> 1) & 0x01);
4705                *cur++ = scale * ((*in     ) & 0x01);
4706             }
4707             if (k > 0) *cur++ = scale * ((*in >> 7)       );
4708             if (k > 1) *cur++ = scale * ((*in >> 6) & 0x01);
4709             if (k > 2) *cur++ = scale * ((*in >> 5) & 0x01);
4710             if (k > 3) *cur++ = scale * ((*in >> 4) & 0x01);
4711             if (k > 4) *cur++ = scale * ((*in >> 3) & 0x01);
4712             if (k > 5) *cur++ = scale * ((*in >> 2) & 0x01);
4713             if (k > 6) *cur++ = scale * ((*in >> 1) & 0x01);
4714          }
4715          if (img_n != out_n) {
4716             int q;
4717             // insert alpha = 255
4718             cur = a->out + stride*j;
4719             if (img_n == 1) {
4720                for (q=x-1; q >= 0; --q) {
4721                   cur[q*2+1] = 255;
4722                   cur[q*2+0] = cur[q];
4723                }
4724             } else {
4725                STBI_ASSERT(img_n == 3);
4726                for (q=x-1; q >= 0; --q) {
4727                   cur[q*4+3] = 255;
4728                   cur[q*4+2] = cur[q*3+2];
4729                   cur[q*4+1] = cur[q*3+1];
4730                   cur[q*4+0] = cur[q*3+0];
4731                }
4732             }
4733          }
4734       }
4735    } else if (depth == 16) {
4736       // force the image data from big-endian to platform-native.
4737       // this is done in a separate pass due to the decoding relying
4738       // on the data being untouched, but could probably be done
4739       // per-line during decode if care is taken.
4740       stbi_uc *cur = a->out;
4741       stbi__uint16 *cur16 = (stbi__uint16*)cur;
4742
4743       for(i=0; i < x*y*out_n; ++i,cur16++,cur+=2) {
4744          *cur16 = (cur[0] << 8) | cur[1];
4745       }
4746    }
4747
4748    return 1;
4749 }
4750
4751 static int stbi__create_png_image(stbi__png *a, stbi_uc *image_data, stbi__uint32 image_data_len, int out_n, int depth, int color, int interlaced)
4752 {
4753    int bytes = (depth == 16 ? 2 : 1);
4754    int out_bytes = out_n * bytes;
4755    stbi_uc *final;
4756    int p;
4757    if (!interlaced)
4758       return stbi__create_png_image_raw(a, image_data, image_data_len, out_n, a->s->img_x, a->s->img_y, depth, color);
4759
4760    // de-interlacing
4761    final = (stbi_uc *) stbi__malloc_mad3(a->s->img_x, a->s->img_y, out_bytes, 0);
4762    for (p=0; p < 7; ++p) {
4763       int xorig[] = { 0,4,0,2,0,1,0 };
4764       int yorig[] = { 0,0,4,0,2,0,1 };
4765       int xspc[]  = { 8,8,4,4,2,2,1 };
4766       int yspc[]  = { 8,8,8,4,4,2,2 };
4767       int i,j,x,y;
4768       // pass1_x[4] = 0, pass1_x[5] = 1, pass1_x[12] = 1
4769       x = (a->s->img_x - xorig[p] + xspc[p]-1) / xspc[p];
4770       y = (a->s->img_y - yorig[p] + yspc[p]-1) / yspc[p];
4771       if (x && y) {
4772          stbi__uint32 img_len = ((((a->s->img_n * x * depth) + 7) >> 3) + 1) * y;
4773          if (!stbi__create_png_image_raw(a, image_data, image_data_len, out_n, x, y, depth, color)) {
4774             STBI_FREE(final);
4775             return 0;
4776          }
4777          for (j=0; j < y; ++j) {
4778             for (i=0; i < x; ++i) {
4779                int out_y = j*yspc[p]+yorig[p];
4780                int out_x = i*xspc[p]+xorig[p];
4781                memcpy(final + out_y*a->s->img_x*out_bytes + out_x*out_bytes,
4782                       a->out + (j*x+i)*out_bytes, out_bytes);
4783             }
4784          }
4785          STBI_FREE(a->out);
4786          image_data += img_len;
4787          image_data_len -= img_len;
4788       }
4789    }
4790    a->out = final;
4791
4792    return 1;
4793 }
4794
4795 static int stbi__compute_transparency(stbi__png *z, stbi_uc tc[3], int out_n)
4796 {
4797    stbi__context *s = z->s;
4798    stbi__uint32 i, pixel_count = s->img_x * s->img_y;
4799    stbi_uc *p = z->out;
4800
4801    // compute color-based transparency, assuming we've
4802    // already got 255 as the alpha value in the output
4803    STBI_ASSERT(out_n == 2 || out_n == 4);
4804
4805    if (out_n == 2) {
4806       for (i=0; i < pixel_count; ++i) {
4807          p[1] = (p[0] == tc[0] ? 0 : 255);
4808          p += 2;
4809       }
4810    } else {
4811       for (i=0; i < pixel_count; ++i) {
4812          if (p[0] == tc[0] && p[1] == tc[1] && p[2] == tc[2])
4813             p[3] = 0;
4814          p += 4;
4815       }
4816    }
4817    return 1;
4818 }
4819
4820 static int stbi__compute_transparency16(stbi__png *z, stbi__uint16 tc[3], int out_n)
4821 {
4822    stbi__context *s = z->s;
4823    stbi__uint32 i, pixel_count = s->img_x * s->img_y;
4824    stbi__uint16 *p = (stbi__uint16*) z->out;
4825
4826    // compute color-based transparency, assuming we've
4827    // already got 65535 as the alpha value in the output
4828    STBI_ASSERT(out_n == 2 || out_n == 4);
4829
4830    if (out_n == 2) {
4831       for (i = 0; i < pixel_count; ++i) {
4832          p[1] = (p[0] == tc[0] ? 0 : 65535);
4833          p += 2;
4834       }
4835    } else {
4836       for (i = 0; i < pixel_count; ++i) {
4837          if (p[0] == tc[0] && p[1] == tc[1] && p[2] == tc[2])
4838             p[3] = 0;
4839          p += 4;
4840       }
4841    }
4842    return 1;
4843 }
4844
4845 static int stbi__expand_png_palette(stbi__png *a, stbi_uc *palette, int len, int pal_img_n)
4846 {
4847    stbi__uint32 i, pixel_count = a->s->img_x * a->s->img_y;
4848    stbi_uc *p, *temp_out, *orig = a->out;
4849
4850    p = (stbi_uc *) stbi__malloc_mad2(pixel_count, pal_img_n, 0);
4851    if (p == NULL) return stbi__err("outofmem", "Out of memory");
4852
4853    // between here and free(out) below, exitting would leak
4854    temp_out = p;
4855
4856    if (pal_img_n == 3) {
4857       for (i=0; i < pixel_count; ++i) {
4858          int n = orig[i]*4;
4859          p[0] = palette[n  ];
4860          p[1] = palette[n+1];
4861          p[2] = palette[n+2];
4862          p += 3;
4863       }
4864    } else {
4865       for (i=0; i < pixel_count; ++i) {
4866          int n = orig[i]*4;
4867          p[0] = palette[n  ];
4868          p[1] = palette[n+1];
4869          p[2] = palette[n+2];
4870          p[3] = palette[n+3];
4871          p += 4;
4872       }
4873    }
4874    STBI_FREE(a->out);
4875    a->out = temp_out;
4876
4877    STBI_NOTUSED(len);
4878
4879    return 1;
4880 }
4881
4882 static int stbi__unpremultiply_on_load = 0;
4883 static int stbi__de_iphone_flag = 0;
4884
4885 STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply)
4886 {
4887    stbi__unpremultiply_on_load = flag_true_if_should_unpremultiply;
4888 }
4889
4890 STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert)
4891 {
4892    stbi__de_iphone_flag = flag_true_if_should_convert;
4893 }
4894
4895 static void stbi__de_iphone(stbi__png *z)
4896 {
4897    stbi__context *s = z->s;
4898    stbi__uint32 i, pixel_count = s->img_x * s->img_y;
4899    stbi_uc *p = z->out;
4900
4901    if (s->img_out_n == 3) {  // convert bgr to rgb
4902       for (i=0; i < pixel_count; ++i) {
4903          stbi_uc t = p[0];
4904          p[0] = p[2];
4905          p[2] = t;
4906          p += 3;
4907       }
4908    } else {
4909       STBI_ASSERT(s->img_out_n == 4);
4910       if (stbi__unpremultiply_on_load) {
4911          // convert bgr to rgb and unpremultiply
4912          for (i=0; i < pixel_count; ++i) {
4913             stbi_uc a = p[3];
4914             stbi_uc t = p[0];
4915             if (a) {
4916                stbi_uc half = a / 2;
4917                p[0] = (p[2] * 255 + half) / a;
4918                p[1] = (p[1] * 255 + half) / a;
4919                p[2] = ( t   * 255 + half) / a;
4920             } else {
4921                p[0] = p[2];
4922                p[2] = t;
4923             }
4924             p += 4;
4925          }
4926       } else {
4927          // convert bgr to rgb
4928          for (i=0; i < pixel_count; ++i) {
4929             stbi_uc t = p[0];
4930             p[0] = p[2];
4931             p[2] = t;
4932             p += 4;
4933          }
4934       }
4935    }
4936 }
4937
4938 #define STBI__PNG_TYPE(a,b,c,d)  (((unsigned) (a) << 24) + ((unsigned) (b) << 16) + ((unsigned) (c) << 8) + (unsigned) (d))
4939
4940 static int stbi__parse_png_file(stbi__png *z, int scan, int req_comp)
4941 {
4942    stbi_uc palette[1024], pal_img_n=0;
4943    stbi_uc has_trans=0, tc[3]={0};
4944    stbi__uint16 tc16[3];
4945    stbi__uint32 ioff=0, idata_limit=0, i, pal_len=0;
4946    int first=1,k,interlace=0, color=0, is_iphone=0;
4947    stbi__context *s = z->s;
4948
4949    z->expanded = NULL;
4950    z->idata = NULL;
4951    z->out = NULL;
4952
4953    if (!stbi__check_png_header(s)) return 0;
4954
4955    if (scan == STBI__SCAN_type) return 1;
4956
4957    for (;;) {
4958       stbi__pngchunk c = stbi__get_chunk_header(s);
4959       switch (c.type) {
4960          case STBI__PNG_TYPE('C','g','B','I'):
4961             is_iphone = 1;
4962             stbi__skip(s, c.length);
4963             break;
4964          case STBI__PNG_TYPE('I','H','D','R'): {
4965             int comp,filter;
4966             if (!first) return stbi__err("multiple IHDR","Corrupt PNG");
4967             first = 0;
4968             if (c.length != 13) return stbi__err("bad IHDR len","Corrupt PNG");
4969             s->img_x = stbi__get32be(s);
4970             s->img_y = stbi__get32be(s);
4971             if (s->img_y > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
4972             if (s->img_x > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
4973             z->depth = stbi__get8(s);  if (z->depth != 1 && z->depth != 2 && z->depth != 4 && z->depth != 8 && z->depth != 16)  return stbi__err("1/2/4/8/16-bit only","PNG not supported: 1/2/4/8/16-bit only");
4974             color = stbi__get8(s);  if (color > 6)         return stbi__err("bad ctype","Corrupt PNG");
4975             if (color == 3 && z->depth == 16)                  return stbi__err("bad ctype","Corrupt PNG");
4976             if (color == 3) pal_img_n = 3; else if (color & 1) return stbi__err("bad ctype","Corrupt PNG");
4977             comp  = stbi__get8(s);  if (comp) return stbi__err("bad comp method","Corrupt PNG");
4978             filter= stbi__get8(s);  if (filter) return stbi__err("bad filter method","Corrupt PNG");
4979             interlace = stbi__get8(s); if (interlace>1) return stbi__err("bad interlace method","Corrupt PNG");
4980             if (!s->img_x || !s->img_y) return stbi__err("0-pixel image","Corrupt PNG");
4981             if (!pal_img_n) {
4982                s->img_n = (color & 2 ? 3 : 1) + (color & 4 ? 1 : 0);
4983                if ((1 << 30) / s->img_x / s->img_n < s->img_y) return stbi__err("too large", "Image too large to decode");
4984                if (scan == STBI__SCAN_header) return 1;
4985             } else {
4986                // if paletted, then pal_n is our final components, and
4987                // img_n is # components to decompress/filter.
4988                s->img_n = 1;
4989                if ((1 << 30) / s->img_x / 4 < s->img_y) return stbi__err("too large","Corrupt PNG");
4990                // if SCAN_header, have to scan to see if we have a tRNS
4991             }
4992             break;
4993          }
4994
4995          case STBI__PNG_TYPE('P','L','T','E'):  {
4996             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4997             if (c.length > 256*3) return stbi__err("invalid PLTE","Corrupt PNG");
4998             pal_len = c.length / 3;
4999             if (pal_len * 3 != c.length) return stbi__err("invalid PLTE","Corrupt PNG");
5000             for (i=0; i < pal_len; ++i) {
5001                palette[i*4+0] = stbi__get8(s);
5002                palette[i*4+1] = stbi__get8(s);
5003                palette[i*4+2] = stbi__get8(s);
5004                palette[i*4+3] = 255;
5005             }
5006             break;
5007          }
5008
5009          case STBI__PNG_TYPE('t','R','N','S'): {
5010             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
5011             if (z->idata) return stbi__err("tRNS after IDAT","Corrupt PNG");
5012             if (pal_img_n) {
5013                if (scan == STBI__SCAN_header) { s->img_n = 4; return 1; }
5014                if (pal_len == 0) return stbi__err("tRNS before PLTE","Corrupt PNG");
5015                if (c.length > pal_len) return stbi__err("bad tRNS len","Corrupt PNG");
5016                pal_img_n = 4;
5017                for (i=0; i < c.length; ++i)
5018                   palette[i*4+3] = stbi__get8(s);
5019             } else {
5020                if (!(s->img_n & 1)) return stbi__err("tRNS with alpha","Corrupt PNG");
5021                if (c.length != (stbi__uint32) s->img_n*2) return stbi__err("bad tRNS len","Corrupt PNG");
5022                has_trans = 1;
5023                if (z->depth == 16) {
5024                   for (k = 0; k < s->img_n; ++k) tc16[k] = (stbi__uint16)stbi__get16be(s); // copy the values as-is
5025                } else {
5026                   for (k = 0; k < s->img_n; ++k) tc[k] = (stbi_uc)(stbi__get16be(s) & 255) * stbi__depth_scale_table[z->depth]; // non 8-bit images will be larger
5027                }
5028             }
5029             break;
5030          }
5031
5032          case STBI__PNG_TYPE('I','D','A','T'): {
5033             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
5034             if (pal_img_n && !pal_len) return stbi__err("no PLTE","Corrupt PNG");
5035             if (scan == STBI__SCAN_header) { s->img_n = pal_img_n; return 1; }
5036             if ((int)(ioff + c.length) < (int)ioff) return 0;
5037             if (ioff + c.length > idata_limit) {
5038                stbi__uint32 idata_limit_old = idata_limit;
5039                stbi_uc *p;
5040                if (idata_limit == 0) idata_limit = c.length > 4096 ? c.length : 4096;
5041                while (ioff + c.length > idata_limit)
5042                   idata_limit *= 2;
5043                STBI_NOTUSED(idata_limit_old);
5044                p = (stbi_uc *) STBI_REALLOC_SIZED(z->idata, idata_limit_old, idata_limit); if (p == NULL) return stbi__err("outofmem", "Out of memory");
5045                z->idata = p;
5046             }
5047             if (!stbi__getn(s, z->idata+ioff,c.length)) return stbi__err("outofdata","Corrupt PNG");
5048             ioff += c.length;
5049             break;
5050          }
5051
5052          case STBI__PNG_TYPE('I','E','N','D'): {
5053             stbi__uint32 raw_len, bpl;
5054             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
5055             if (scan != STBI__SCAN_load) return 1;
5056             if (z->idata == NULL) return stbi__err("no IDAT","Corrupt PNG");
5057             // initial guess for decoded data size to avoid unnecessary reallocs
5058             bpl = (s->img_x * z->depth + 7) / 8; // bytes per line, per component
5059             raw_len = bpl * s->img_y * s->img_n /* pixels */ + s->img_y /* filter mode per row */;
5060             z->expanded = (stbi_uc *) stbi_zlib_decode_malloc_guesssize_headerflag((char *) z->idata, ioff, raw_len, (int *) &raw_len, !is_iphone);
5061             if (z->expanded == NULL) return 0; // zlib should set error
5062             STBI_FREE(z->idata); z->idata = NULL;
5063             if ((req_comp == s->img_n+1 && req_comp != 3 && !pal_img_n) || has_trans)
5064                s->img_out_n = s->img_n+1;
5065             else
5066                s->img_out_n = s->img_n;
5067             if (!stbi__create_png_image(z, z->expanded, raw_len, s->img_out_n, z->depth, color, interlace)) return 0;
5068             if (has_trans) {
5069                if (z->depth == 16) {
5070                   if (!stbi__compute_transparency16(z, tc16, s->img_out_n)) return 0;
5071                } else {
5072                   if (!stbi__compute_transparency(z, tc, s->img_out_n)) return 0;
5073                }
5074             }
5075             if (is_iphone && stbi__de_iphone_flag && s->img_out_n > 2)
5076                stbi__de_iphone(z);
5077             if (pal_img_n) {
5078                // pal_img_n == 3 or 4
5079                s->img_n = pal_img_n; // record the actual colors we had
5080                s->img_out_n = pal_img_n;
5081                if (req_comp >= 3) s->img_out_n = req_comp;
5082                if (!stbi__expand_png_palette(z, palette, pal_len, s->img_out_n))
5083                   return 0;
5084             } else if (has_trans) {
5085                // non-paletted image with tRNS -> source image has (constant) alpha
5086                ++s->img_n;
5087             }
5088             STBI_FREE(z->expanded); z->expanded = NULL;
5089             // end of PNG chunk, read and skip CRC
5090             stbi__get32be(s);
5091             return 1;
5092          }
5093
5094          default:
5095             // if critical, fail
5096             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
5097             if ((c.type & (1 << 29)) == 0) {
5098                #ifndef STBI_NO_FAILURE_STRINGS
5099                // not threadsafe
5100                static char invalid_chunk[] = "XXXX PNG chunk not known";
5101                invalid_chunk[0] = STBI__BYTECAST(c.type >> 24);
5102                invalid_chunk[1] = STBI__BYTECAST(c.type >> 16);
5103                invalid_chunk[2] = STBI__BYTECAST(c.type >>  8);
5104                invalid_chunk[3] = STBI__BYTECAST(c.type >>  0);
5105                #endif
5106                return stbi__err(invalid_chunk, "PNG not supported: unknown PNG chunk type");
5107             }
5108             stbi__skip(s, c.length);
5109             break;
5110       }
5111       // end of PNG chunk, read and skip CRC
5112       stbi__get32be(s);
5113    }
5114 }
5115
5116 static void *stbi__do_png(stbi__png *p, int *x, int *y, int *n, int req_comp, stbi__result_info *ri)
5117 {
5118    void *result=NULL;
5119    if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error");
5120    if (stbi__parse_png_file(p, STBI__SCAN_load, req_comp)) {
5121       if (p->depth <= 8)
5122          ri->bits_per_channel = 8;
5123       else if (p->depth == 16)
5124          ri->bits_per_channel = 16;
5125       else
5126          return stbi__errpuc("bad bits_per_channel", "PNG not supported: unsupported color depth");
5127       result = p->out;
5128       p->out = NULL;
5129       if (req_comp && req_comp != p->s->img_out_n) {
5130          if (ri->bits_per_channel == 8)
5131             result = stbi__convert_format((unsigned char *) result, p->s->img_out_n, req_comp, p->s->img_x, p->s->img_y);
5132          else
5133             result = stbi__convert_format16((stbi__uint16 *) result, p->s->img_out_n, req_comp, p->s->img_x, p->s->img_y);
5134          p->s->img_out_n = req_comp;
5135          if (result == NULL) return result;
5136       }
5137       *x = p->s->img_x;
5138       *y = p->s->img_y;
5139       if (n) *n = p->s->img_n;
5140    }
5141    STBI_FREE(p->out);      p->out      = NULL;
5142    STBI_FREE(p->expanded); p->expanded = NULL;
5143    STBI_FREE(p->idata);    p->idata    = NULL;
5144
5145    return result;
5146 }
5147
5148 static void *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
5149 {
5150    stbi__png p;
5151    p.s = s;
5152    return stbi__do_png(&p, x,y,comp,req_comp, ri);
5153 }
5154
5155 static int stbi__png_test(stbi__context *s)
5156 {
5157    int r;
5158    r = stbi__check_png_header(s);
5159    stbi__rewind(s);
5160    return r;
5161 }
5162
5163 static int stbi__png_info_raw(stbi__png *p, int *x, int *y, int *comp)
5164 {
5165    if (!stbi__parse_png_file(p, STBI__SCAN_header, 0)) {
5166       stbi__rewind( p->s );
5167       return 0;
5168    }
5169    if (x) *x = p->s->img_x;
5170    if (y) *y = p->s->img_y;
5171    if (comp) *comp = p->s->img_n;
5172    return 1;
5173 }
5174
5175 static int stbi__png_info(stbi__context *s, int *x, int *y, int *comp)
5176 {
5177    stbi__png p;
5178    p.s = s;
5179    return stbi__png_info_raw(&p, x, y, comp);
5180 }
5181
5182 static int stbi__png_is16(stbi__context *s)
5183 {
5184    stbi__png p;
5185    p.s = s;
5186    if (!stbi__png_info_raw(&p, NULL, NULL, NULL))
5187            return 0;
5188    if (p.depth != 16) {
5189       stbi__rewind(p.s);
5190       return 0;
5191    }
5192    return 1;
5193 }
5194 #endif
5195
5196 // Microsoft/Windows BMP image
5197
5198 #ifndef STBI_NO_BMP
5199 static int stbi__bmp_test_raw(stbi__context *s)
5200 {
5201    int r;
5202    int sz;
5203    if (stbi__get8(s) != 'B') return 0;
5204    if (stbi__get8(s) != 'M') return 0;
5205    stbi__get32le(s); // discard filesize
5206    stbi__get16le(s); // discard reserved
5207    stbi__get16le(s); // discard reserved
5208    stbi__get32le(s); // discard data offset
5209    sz = stbi__get32le(s);
5210    r = (sz == 12 || sz == 40 || sz == 56 || sz == 108 || sz == 124);
5211    return r;
5212 }
5213
5214 static int stbi__bmp_test(stbi__context *s)
5215 {
5216    int r = stbi__bmp_test_raw(s);
5217    stbi__rewind(s);
5218    return r;
5219 }
5220
5221
5222 // returns 0..31 for the highest set bit
5223 static int stbi__high_bit(unsigned int z)
5224 {
5225    int n=0;
5226    if (z == 0) return -1;
5227    if (z >= 0x10000) { n += 16; z >>= 16; }
5228    if (z >= 0x00100) { n +=  8; z >>=  8; }
5229    if (z >= 0x00010) { n +=  4; z >>=  4; }
5230    if (z >= 0x00004) { n +=  2; z >>=  2; }
5231    if (z >= 0x00002) { n +=  1;/* >>=  1;*/ }
5232    return n;
5233 }
5234
5235 static int stbi__bitcount(unsigned int a)
5236 {
5237    a = (a & 0x55555555) + ((a >>  1) & 0x55555555); // max 2
5238    a = (a & 0x33333333) + ((a >>  2) & 0x33333333); // max 4
5239    a = (a + (a >> 4)) & 0x0f0f0f0f; // max 8 per 4, now 8 bits
5240    a = (a + (a >> 8)); // max 16 per 8 bits
5241    a = (a + (a >> 16)); // max 32 per 8 bits
5242    return a & 0xff;
5243 }
5244
5245 // extract an arbitrarily-aligned N-bit value (N=bits)
5246 // from v, and then make it 8-bits long and fractionally
5247 // extend it to full full range.
5248 static int stbi__shiftsigned(unsigned int v, int shift, int bits)
5249 {
5250    static unsigned int mul_table[9] = {
5251       0,
5252       0xff/*0b11111111*/, 0x55/*0b01010101*/, 0x49/*0b01001001*/, 0x11/*0b00010001*/,
5253       0x21/*0b00100001*/, 0x41/*0b01000001*/, 0x81/*0b10000001*/, 0x01/*0b00000001*/,
5254    };
5255    static unsigned int shift_table[9] = {
5256       0, 0,0,1,0,2,4,6,0,
5257    };
5258    if (shift < 0)
5259       v <<= -shift;
5260    else
5261       v >>= shift;
5262    STBI_ASSERT(v < 256);
5263    v >>= (8-bits);
5264    STBI_ASSERT(bits >= 0 && bits <= 8);
5265    return (int) ((unsigned) v * mul_table[bits]) >> shift_table[bits];
5266 }
5267
5268 typedef struct
5269 {
5270    int bpp, offset, hsz;
5271    unsigned int mr,mg,mb,ma, all_a;
5272    int extra_read;
5273 } stbi__bmp_data;
5274
5275 static void *stbi__bmp_parse_header(stbi__context *s, stbi__bmp_data *info)
5276 {
5277    int hsz;
5278    if (stbi__get8(s) != 'B' || stbi__get8(s) != 'M') return stbi__errpuc("not BMP", "Corrupt BMP");
5279    stbi__get32le(s); // discard filesize
5280    stbi__get16le(s); // discard reserved
5281    stbi__get16le(s); // discard reserved
5282    info->offset = stbi__get32le(s);
5283    info->hsz = hsz = stbi__get32le(s);
5284    info->mr = info->mg = info->mb = info->ma = 0;
5285    info->extra_read = 14;
5286
5287    if (info->offset < 0) return stbi__errpuc("bad BMP", "bad BMP");
5288
5289    if (hsz != 12 && hsz != 40 && hsz != 56 && hsz != 108 && hsz != 124) return stbi__errpuc("unknown BMP", "BMP type not supported: unknown");
5290    if (hsz == 12) {
5291       s->img_x = stbi__get16le(s);
5292       s->img_y = stbi__get16le(s);
5293    } else {
5294       s->img_x = stbi__get32le(s);
5295       s->img_y = stbi__get32le(s);
5296    }
5297    if (stbi__get16le(s) != 1) return stbi__errpuc("bad BMP", "bad BMP");
5298    info->bpp = stbi__get16le(s);
5299    if (hsz != 12) {
5300       int compress = stbi__get32le(s);
5301       if (compress == 1 || compress == 2) return stbi__errpuc("BMP RLE", "BMP type not supported: RLE");
5302       stbi__get32le(s); // discard sizeof
5303       stbi__get32le(s); // discard hres
5304       stbi__get32le(s); // discard vres
5305       stbi__get32le(s); // discard colorsused
5306       stbi__get32le(s); // discard max important
5307       if (hsz == 40 || hsz == 56) {
5308          if (hsz == 56) {
5309             stbi__get32le(s);
5310             stbi__get32le(s);
5311             stbi__get32le(s);
5312             stbi__get32le(s);
5313          }
5314          if (info->bpp == 16 || info->bpp == 32) {
5315             if (compress == 0) {
5316                if (info->bpp == 32) {
5317                   info->mr = 0xffu << 16;
5318                   info->mg = 0xffu <<  8;
5319                   info->mb = 0xffu <<  0;
5320                   info->ma = 0xffu << 24;
5321                   info->all_a = 0; // if all_a is 0 at end, then we loaded alpha channel but it was all 0
5322                } else {
5323                   info->mr = 31u << 10;
5324                   info->mg = 31u <<  5;
5325                   info->mb = 31u <<  0;
5326                }
5327             } else if (compress == 3) {
5328                info->mr = stbi__get32le(s);
5329                info->mg = stbi__get32le(s);
5330                info->mb = stbi__get32le(s);
5331                info->extra_read += 12;
5332                // not documented, but generated by photoshop and handled by mspaint
5333                if (info->mr == info->mg && info->mg == info->mb) {
5334                   // ?!?!?
5335                   return stbi__errpuc("bad BMP", "bad BMP");
5336                }
5337             } else
5338                return stbi__errpuc("bad BMP", "bad BMP");
5339          }
5340       } else {
5341          int i;
5342          if (hsz != 108 && hsz != 124)
5343             return stbi__errpuc("bad BMP", "bad BMP");
5344          info->mr = stbi__get32le(s);
5345          info->mg = stbi__get32le(s);
5346          info->mb = stbi__get32le(s);
5347          info->ma = stbi__get32le(s);
5348          stbi__get32le(s); // discard color space
5349          for (i=0; i < 12; ++i)
5350             stbi__get32le(s); // discard color space parameters
5351          if (hsz == 124) {
5352             stbi__get32le(s); // discard rendering intent
5353             stbi__get32le(s); // discard offset of profile data
5354             stbi__get32le(s); // discard size of profile data
5355             stbi__get32le(s); // discard reserved
5356          }
5357       }
5358    }
5359    return (void *) 1;
5360 }
5361
5362
5363 static void *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
5364 {
5365    stbi_uc *out;
5366    unsigned int mr=0,mg=0,mb=0,ma=0, all_a;
5367    stbi_uc pal[256][4];
5368    int psize=0,i,j,width;
5369    int flip_vertically, pad, target;
5370    stbi__bmp_data info;
5371    STBI_NOTUSED(ri);
5372
5373    info.all_a = 255;
5374    if (stbi__bmp_parse_header(s, &info) == NULL)
5375       return NULL; // error code already set
5376
5377    flip_vertically = ((int) s->img_y) > 0;
5378    s->img_y = abs((int) s->img_y);
5379
5380    if (s->img_y > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
5381    if (s->img_x > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
5382
5383    mr = info.mr;
5384    mg = info.mg;
5385    mb = info.mb;
5386    ma = info.ma;
5387    all_a = info.all_a;
5388
5389    if (info.hsz == 12) {
5390       if (info.bpp < 24)
5391          psize = (info.offset - info.extra_read - 24) / 3;
5392    } else {
5393       if (info.bpp < 16)
5394          psize = (info.offset - info.extra_read - info.hsz) >> 2;
5395    }
5396    if (psize == 0) {
5397       STBI_ASSERT(info.offset == s->callback_already_read + (int) (s->img_buffer - s->img_buffer_original));
5398       if (info.offset != s->callback_already_read + (s->img_buffer - s->buffer_start)) {
5399         return stbi__errpuc("bad offset", "Corrupt BMP");
5400       }
5401    }
5402
5403    if (info.bpp == 24 && ma == 0xff000000)
5404       s->img_n = 3;
5405    else
5406       s->img_n = ma ? 4 : 3;
5407    if (req_comp && req_comp >= 3) // we can directly decode 3 or 4
5408       target = req_comp;
5409    else
5410       target = s->img_n; // if they want monochrome, we'll post-convert
5411
5412    // sanity-check size
5413    if (!stbi__mad3sizes_valid(target, s->img_x, s->img_y, 0))
5414       return stbi__errpuc("too large", "Corrupt BMP");
5415
5416    out = (stbi_uc *) stbi__malloc_mad3(target, s->img_x, s->img_y, 0);
5417    if (!out) return stbi__errpuc("outofmem", "Out of memory");
5418    if (info.bpp < 16) {
5419       int z=0;
5420       if (psize == 0 || psize > 256) { STBI_FREE(out); return stbi__errpuc("invalid", "Corrupt BMP"); }
5421       for (i=0; i < psize; ++i) {
5422          pal[i][2] = stbi__get8(s);
5423          pal[i][1] = stbi__get8(s);
5424          pal[i][0] = stbi__get8(s);
5425          if (info.hsz != 12) stbi__get8(s);
5426          pal[i][3] = 255;
5427       }
5428       stbi__skip(s, info.offset - info.extra_read - info.hsz - psize * (info.hsz == 12 ? 3 : 4));
5429       if (info.bpp == 1) width = (s->img_x + 7) >> 3;
5430       else if (info.bpp == 4) width = (s->img_x + 1) >> 1;
5431       else if (info.bpp == 8) width = s->img_x;
5432       else { STBI_FREE(out); return stbi__errpuc("bad bpp", "Corrupt BMP"); }
5433       pad = (-width)&3;
5434       if (info.bpp == 1) {
5435          for (j=0; j < (int) s->img_y; ++j) {
5436             int bit_offset = 7, v = stbi__get8(s);
5437             for (i=0; i < (int) s->img_x; ++i) {
5438                int color = (v>>bit_offset)&0x1;
5439                out[z++] = pal[color][0];
5440                out[z++] = pal[color][1];
5441                out[z++] = pal[color][2];
5442                if (target == 4) out[z++] = 255;
5443                if (i+1 == (int) s->img_x) break;
5444                if((--bit_offset) < 0) {
5445                   bit_offset = 7;
5446                   v = stbi__get8(s);
5447                }
5448             }
5449             stbi__skip(s, pad);
5450          }
5451       } else {
5452          for (j=0; j < (int) s->img_y; ++j) {
5453             for (i=0; i < (int) s->img_x; i += 2) {
5454                int v=stbi__get8(s),v2=0;
5455                if (info.bpp == 4) {
5456                   v2 = v & 15;
5457                   v >>= 4;
5458                }
5459                out[z++] = pal[v][0];
5460                out[z++] = pal[v][1];
5461                out[z++] = pal[v][2];
5462                if (target == 4) out[z++] = 255;
5463                if (i+1 == (int) s->img_x) break;
5464                v = (info.bpp == 8) ? stbi__get8(s) : v2;
5465                out[z++] = pal[v][0];
5466                out[z++] = pal[v][1];
5467                out[z++] = pal[v][2];
5468                if (target == 4) out[z++] = 255;
5469             }
5470             stbi__skip(s, pad);
5471          }
5472       }
5473    } else {
5474       int rshift=0,gshift=0,bshift=0,ashift=0,rcount=0,gcount=0,bcount=0,acount=0;
5475       int z = 0;
5476       int easy=0;
5477       stbi__skip(s, info.offset - info.extra_read - info.hsz);
5478       if (info.bpp == 24) width = 3 * s->img_x;
5479       else if (info.bpp == 16) width = 2*s->img_x;
5480       else /* bpp = 32 and pad = 0 */ width=0;
5481       pad = (-width) & 3;
5482       if (info.bpp == 24) {
5483          easy = 1;
5484       } else if (info.bpp == 32) {
5485          if (mb == 0xff && mg == 0xff00 && mr == 0x00ff0000 && ma == 0xff000000)
5486             easy = 2;
5487       }
5488       if (!easy) {
5489          if (!mr || !mg || !mb) { STBI_FREE(out); return stbi__errpuc("bad masks", "Corrupt BMP"); }
5490          // right shift amt to put high bit in position #7
5491          rshift = stbi__high_bit(mr)-7; rcount = stbi__bitcount(mr);
5492          gshift = stbi__high_bit(mg)-7; gcount = stbi__bitcount(mg);
5493          bshift = stbi__high_bit(mb)-7; bcount = stbi__bitcount(mb);
5494          ashift = stbi__high_bit(ma)-7; acount = stbi__bitcount(ma);
5495          if (rcount > 8 || gcount > 8 || bcount > 8 || acount > 8) { STBI_FREE(out); return stbi__errpuc("bad masks", "Corrupt BMP"); }
5496       }
5497       for (j=0; j < (int) s->img_y; ++j) {
5498          if (easy) {
5499             for (i=0; i < (int) s->img_x; ++i) {
5500                unsigned char a;
5501                out[z+2] = stbi__get8(s);
5502                out[z+1] = stbi__get8(s);
5503                out[z+0] = stbi__get8(s);
5504                z += 3;
5505                a = (easy == 2 ? stbi__get8(s) : 255);
5506                all_a |= a;
5507                if (target == 4) out[z++] = a;
5508             }
5509          } else {
5510             int bpp = info.bpp;
5511             for (i=0; i < (int) s->img_x; ++i) {
5512                stbi__uint32 v = (bpp == 16 ? (stbi__uint32) stbi__get16le(s) : stbi__get32le(s));
5513                unsigned int a;
5514                out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mr, rshift, rcount));
5515                out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mg, gshift, gcount));
5516                out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mb, bshift, bcount));
5517                a = (ma ? stbi__shiftsigned(v & ma, ashift, acount) : 255);
5518                all_a |= a;
5519                if (target == 4) out[z++] = STBI__BYTECAST(a);
5520             }
5521          }
5522          stbi__skip(s, pad);
5523       }
5524    }
5525
5526    // if alpha channel is all 0s, replace with all 255s
5527    if (target == 4 && all_a == 0)
5528       for (i=4*s->img_x*s->img_y-1; i >= 0; i -= 4)
5529          out[i] = 255;
5530
5531    if (flip_vertically) {
5532       stbi_uc t;
5533       for (j=0; j < (int) s->img_y>>1; ++j) {
5534          stbi_uc *p1 = out +      j     *s->img_x*target;
5535          stbi_uc *p2 = out + (s->img_y-1-j)*s->img_x*target;
5536          for (i=0; i < (int) s->img_x*target; ++i) {
5537             t = p1[i]; p1[i] = p2[i]; p2[i] = t;
5538          }
5539       }
5540    }
5541
5542    if (req_comp && req_comp != target) {
5543       out = stbi__convert_format(out, target, req_comp, s->img_x, s->img_y);
5544       if (out == NULL) return out; // stbi__convert_format frees input on failure
5545    }
5546
5547    *x = s->img_x;
5548    *y = s->img_y;
5549    if (comp) *comp = s->img_n;
5550    return out;
5551 }
5552 #endif
5553
5554 // Targa Truevision - TGA
5555 // by Jonathan Dummer
5556 #ifndef STBI_NO_TGA
5557 // returns STBI_rgb or whatever, 0 on error
5558 static int stbi__tga_get_comp(int bits_per_pixel, int is_grey, int* is_rgb16)
5559 {
5560    // only RGB or RGBA (incl. 16bit) or grey allowed
5561    if (is_rgb16) *is_rgb16 = 0;
5562    switch(bits_per_pixel) {
5563       case 8:  return STBI_grey;
5564       case 16: if(is_grey) return STBI_grey_alpha;
5565                // fallthrough
5566       case 15: if(is_rgb16) *is_rgb16 = 1;
5567                return STBI_rgb;
5568       case 24: // fallthrough
5569       case 32: return bits_per_pixel/8;
5570       default: return 0;
5571    }
5572 }
5573
5574 static int stbi__tga_info(stbi__context *s, int *x, int *y, int *comp)
5575 {
5576     int tga_w, tga_h, tga_comp, tga_image_type, tga_bits_per_pixel, tga_colormap_bpp;
5577     int sz, tga_colormap_type;
5578     stbi__get8(s);                   // discard Offset
5579     tga_colormap_type = stbi__get8(s); // colormap type
5580     if( tga_colormap_type > 1 ) {
5581         stbi__rewind(s);
5582         return 0;      // only RGB or indexed allowed
5583     }
5584     tga_image_type = stbi__get8(s); // image type
5585     if ( tga_colormap_type == 1 ) { // colormapped (paletted) image
5586         if (tga_image_type != 1 && tga_image_type != 9) {
5587             stbi__rewind(s);
5588             return 0;
5589         }
5590         stbi__skip(s,4);       // skip index of first colormap entry and number of entries
5591         sz = stbi__get8(s);    //   check bits per palette color entry
5592         if ( (sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32) ) {
5593             stbi__rewind(s);
5594             return 0;
5595         }
5596         stbi__skip(s,4);       // skip image x and y origin
5597         tga_colormap_bpp = sz;
5598     } else { // "normal" image w/o colormap - only RGB or grey allowed, +/- RLE
5599         if ( (tga_image_type != 2) && (tga_image_type != 3) && (tga_image_type != 10) && (tga_image_type != 11) ) {
5600             stbi__rewind(s);
5601             return 0; // only RGB or grey allowed, +/- RLE
5602         }
5603         stbi__skip(s,9); // skip colormap specification and image x/y origin
5604         tga_colormap_bpp = 0;
5605     }
5606     tga_w = stbi__get16le(s);
5607     if( tga_w < 1 ) {
5608         stbi__rewind(s);
5609         return 0;   // test width
5610     }
5611     tga_h = stbi__get16le(s);
5612     if( tga_h < 1 ) {
5613         stbi__rewind(s);
5614         return 0;   // test height
5615     }
5616     tga_bits_per_pixel = stbi__get8(s); // bits per pixel
5617     stbi__get8(s); // ignore alpha bits
5618     if (tga_colormap_bpp != 0) {
5619         if((tga_bits_per_pixel != 8) && (tga_bits_per_pixel != 16)) {
5620             // when using a colormap, tga_bits_per_pixel is the size of the indexes
5621             // I don't think anything but 8 or 16bit indexes makes sense
5622             stbi__rewind(s);
5623             return 0;
5624         }
5625         tga_comp = stbi__tga_get_comp(tga_colormap_bpp, 0, NULL);
5626     } else {
5627         tga_comp = stbi__tga_get_comp(tga_bits_per_pixel, (tga_image_type == 3) || (tga_image_type == 11), NULL);
5628     }
5629     if(!tga_comp) {
5630       stbi__rewind(s);
5631       return 0;
5632     }
5633     if (x) *x = tga_w;
5634     if (y) *y = tga_h;
5635     if (comp) *comp = tga_comp;
5636     return 1;                   // seems to have passed everything
5637 }
5638
5639 static int stbi__tga_test(stbi__context *s)
5640 {
5641    int res = 0;
5642    int sz, tga_color_type;
5643    stbi__get8(s);      //   discard Offset
5644    tga_color_type = stbi__get8(s);   //   color type
5645    if ( tga_color_type > 1 ) goto errorEnd;   //   only RGB or indexed allowed
5646    sz = stbi__get8(s);   //   image type
5647    if ( tga_color_type == 1 ) { // colormapped (paletted) image
5648       if (sz != 1 && sz != 9) goto errorEnd; // colortype 1 demands image type 1 or 9
5649       stbi__skip(s,4);       // skip index of first colormap entry and number of entries
5650       sz = stbi__get8(s);    //   check bits per palette color entry
5651       if ( (sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32) ) goto errorEnd;
5652       stbi__skip(s,4);       // skip image x and y origin
5653    } else { // "normal" image w/o colormap
5654       if ( (sz != 2) && (sz != 3) && (sz != 10) && (sz != 11) ) goto errorEnd; // only RGB or grey allowed, +/- RLE
5655       stbi__skip(s,9); // skip colormap specification and image x/y origin
5656    }
5657    if ( stbi__get16le(s) < 1 ) goto errorEnd;      //   test width
5658    if ( stbi__get16le(s) < 1 ) goto errorEnd;      //   test height
5659    sz = stbi__get8(s);   //   bits per pixel
5660    if ( (tga_color_type == 1) && (sz != 8) && (sz != 16) ) goto errorEnd; // for colormapped images, bpp is size of an index
5661    if ( (sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32) ) goto errorEnd;
5662
5663    res = 1; // if we got this far, everything's good and we can return 1 instead of 0
5664
5665 errorEnd:
5666    stbi__rewind(s);
5667    return res;
5668 }
5669
5670 // read 16bit value and convert to 24bit RGB
5671 static void stbi__tga_read_rgb16(stbi__context *s, stbi_uc* out)
5672 {
5673    stbi__uint16 px = (stbi__uint16)stbi__get16le(s);
5674    stbi__uint16 fiveBitMask = 31;
5675    // we have 3 channels with 5bits each
5676    int r = (px >> 10) & fiveBitMask;
5677    int g = (px >> 5) & fiveBitMask;
5678    int b = px & fiveBitMask;
5679    // Note that this saves the data in RGB(A) order, so it doesn't need to be swapped later
5680    out[0] = (stbi_uc)((r * 255)/31);
5681    out[1] = (stbi_uc)((g * 255)/31);
5682    out[2] = (stbi_uc)((b * 255)/31);
5683
5684    // some people claim that the most significant bit might be used for alpha
5685    // (possibly if an alpha-bit is set in the "image descriptor byte")
5686    // but that only made 16bit test images completely translucent..
5687    // so let's treat all 15 and 16bit TGAs as RGB with no alpha.
5688 }
5689
5690 static void *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
5691 {
5692    //   read in the TGA header stuff
5693    int tga_offset = stbi__get8(s);
5694    int tga_indexed = stbi__get8(s);
5695    int tga_image_type = stbi__get8(s);
5696    int tga_is_RLE = 0;
5697    int tga_palette_start = stbi__get16le(s);
5698    int tga_palette_len = stbi__get16le(s);
5699    int tga_palette_bits = stbi__get8(s);
5700    int tga_x_origin = stbi__get16le(s);
5701    int tga_y_origin = stbi__get16le(s);
5702    int tga_width = stbi__get16le(s);
5703    int tga_height = stbi__get16le(s);
5704    int tga_bits_per_pixel = stbi__get8(s);
5705    int tga_comp, tga_rgb16=0;
5706    int tga_inverted = stbi__get8(s);
5707    // int tga_alpha_bits = tga_inverted & 15; // the 4 lowest bits - unused (useless?)
5708    //   image data
5709    unsigned char *tga_data;
5710    unsigned char *tga_palette = NULL;
5711    int i, j;
5712    unsigned char raw_data[4] = {0};
5713    int RLE_count = 0;
5714    int RLE_repeating = 0;
5715    int read_next_pixel = 1;
5716    STBI_NOTUSED(ri);
5717    STBI_NOTUSED(tga_x_origin); // @TODO
5718    STBI_NOTUSED(tga_y_origin); // @TODO
5719
5720    if (tga_height > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
5721    if (tga_width > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
5722
5723    //   do a tiny bit of precessing
5724    if ( tga_image_type >= 8 )
5725    {
5726       tga_image_type -= 8;
5727       tga_is_RLE = 1;
5728    }
5729    tga_inverted = 1 - ((tga_inverted >> 5) & 1);
5730
5731    //   If I'm paletted, then I'll use the number of bits from the palette
5732    if ( tga_indexed ) tga_comp = stbi__tga_get_comp(tga_palette_bits, 0, &tga_rgb16);
5733    else tga_comp = stbi__tga_get_comp(tga_bits_per_pixel, (tga_image_type == 3), &tga_rgb16);
5734
5735    if(!tga_comp) // shouldn't really happen, stbi__tga_test() should have ensured basic consistency
5736       return stbi__errpuc("bad format", "Can't find out TGA pixelformat");
5737
5738    //   tga info
5739    *x = tga_width;
5740    *y = tga_height;
5741    if (comp) *comp = tga_comp;
5742
5743    if (!stbi__mad3sizes_valid(tga_width, tga_height, tga_comp, 0))
5744       return stbi__errpuc("too large", "Corrupt TGA");
5745
5746    tga_data = (unsigned char*)stbi__malloc_mad3(tga_width, tga_height, tga_comp, 0);
5747    if (!tga_data) return stbi__errpuc("outofmem", "Out of memory");
5748
5749    // skip to the data's starting position (offset usually = 0)
5750    stbi__skip(s, tga_offset );
5751
5752    if ( !tga_indexed && !tga_is_RLE && !tga_rgb16 ) {
5753       for (i=0; i < tga_height; ++i) {
5754          int row = tga_inverted ? tga_height -i - 1 : i;
5755          stbi_uc *tga_row = tga_data + row*tga_width*tga_comp;
5756          stbi__getn(s, tga_row, tga_width * tga_comp);
5757       }
5758    } else  {
5759       //   do I need to load a palette?
5760       if ( tga_indexed)
5761       {
5762          if (tga_palette_len == 0) {  /* you have to have at least one entry! */
5763             STBI_FREE(tga_data);
5764             return stbi__errpuc("bad palette", "Corrupt TGA");
5765          }
5766
5767          //   any data to skip? (offset usually = 0)
5768          stbi__skip(s, tga_palette_start );
5769          //   load the palette
5770          tga_palette = (unsigned char*)stbi__malloc_mad2(tga_palette_len, tga_comp, 0);
5771          if (!tga_palette) {
5772             STBI_FREE(tga_data);
5773             return stbi__errpuc("outofmem", "Out of memory");
5774          }
5775          if (tga_rgb16) {
5776             stbi_uc *pal_entry = tga_palette;
5777             STBI_ASSERT(tga_comp == STBI_rgb);
5778             for (i=0; i < tga_palette_len; ++i) {
5779                stbi__tga_read_rgb16(s, pal_entry);
5780                pal_entry += tga_comp;
5781             }
5782          } else if (!stbi__getn(s, tga_palette, tga_palette_len * tga_comp)) {
5783                STBI_FREE(tga_data);
5784                STBI_FREE(tga_palette);
5785                return stbi__errpuc("bad palette", "Corrupt TGA");
5786          }
5787       }
5788       //   load the data
5789       for (i=0; i < tga_width * tga_height; ++i)
5790       {
5791          //   if I'm in RLE mode, do I need to get a RLE stbi__pngchunk?
5792          if ( tga_is_RLE )
5793          {
5794             if ( RLE_count == 0 )
5795             {
5796                //   yep, get the next byte as a RLE command
5797                int RLE_cmd = stbi__get8(s);
5798                RLE_count = 1 + (RLE_cmd & 127);
5799                RLE_repeating = RLE_cmd >> 7;
5800                read_next_pixel = 1;
5801             } else if ( !RLE_repeating )
5802             {
5803                read_next_pixel = 1;
5804             }
5805          } else
5806          {
5807             read_next_pixel = 1;
5808          }
5809          //   OK, if I need to read a pixel, do it now
5810          if ( read_next_pixel )
5811          {
5812             //   load however much data we did have
5813             if ( tga_indexed )
5814             {
5815                // read in index, then perform the lookup
5816                int pal_idx = (tga_bits_per_pixel == 8) ? stbi__get8(s) : stbi__get16le(s);
5817                if ( pal_idx >= tga_palette_len ) {
5818                   // invalid index
5819                   pal_idx = 0;
5820                }
5821                pal_idx *= tga_comp;
5822                for (j = 0; j < tga_comp; ++j) {
5823                   raw_data[j] = tga_palette[pal_idx+j];
5824                }
5825             } else if(tga_rgb16) {
5826                STBI_ASSERT(tga_comp == STBI_rgb);
5827                stbi__tga_read_rgb16(s, raw_data);
5828             } else {
5829                //   read in the data raw
5830                for (j = 0; j < tga_comp; ++j) {
5831                   raw_data[j] = stbi__get8(s);
5832                }
5833             }
5834             //   clear the reading flag for the next pixel
5835             read_next_pixel = 0;
5836          } // end of reading a pixel
5837
5838          // copy data
5839          for (j = 0; j < tga_comp; ++j)
5840            tga_data[i*tga_comp+j] = raw_data[j];
5841
5842          //   in case we're in RLE mode, keep counting down
5843          --RLE_count;
5844       }
5845       //   do I need to invert the image?
5846       if ( tga_inverted )
5847       {
5848          for (j = 0; j*2 < tga_height; ++j)
5849          {
5850             int index1 = j * tga_width * tga_comp;
5851             int index2 = (tga_height - 1 - j) * tga_width * tga_comp;
5852             for (i = tga_width * tga_comp; i > 0; --i)
5853             {
5854                unsigned char temp = tga_data[index1];
5855                tga_data[index1] = tga_data[index2];
5856                tga_data[index2] = temp;
5857                ++index1;
5858                ++index2;
5859             }
5860          }
5861       }
5862       //   clear my palette, if I had one
5863       if ( tga_palette != NULL )
5864       {
5865          STBI_FREE( tga_palette );
5866       }
5867    }
5868
5869    // swap RGB - if the source data was RGB16, it already is in the right order
5870    if (tga_comp >= 3 && !tga_rgb16)
5871    {
5872       unsigned char* tga_pixel = tga_data;
5873       for (i=0; i < tga_width * tga_height; ++i)
5874       {
5875          unsigned char temp = tga_pixel[0];
5876          tga_pixel[0] = tga_pixel[2];
5877          tga_pixel[2] = temp;
5878          tga_pixel += tga_comp;
5879       }
5880    }
5881
5882    // convert to target component count
5883    if (req_comp && req_comp != tga_comp)
5884       tga_data = stbi__convert_format(tga_data, tga_comp, req_comp, tga_width, tga_height);
5885
5886    //   the things I do to get rid of an error message, and yet keep
5887    //   Microsoft's C compilers happy... [8^(
5888    tga_palette_start = tga_palette_len = tga_palette_bits =
5889          tga_x_origin = tga_y_origin = 0;
5890    STBI_NOTUSED(tga_palette_start);
5891    //   OK, done
5892    return tga_data;
5893 }
5894 #endif
5895
5896 // *************************************************************************************************
5897 // Photoshop PSD loader -- PD by Thatcher Ulrich, integration by Nicolas Schulz, tweaked by STB
5898
5899 #ifndef STBI_NO_PSD
5900 static int stbi__psd_test(stbi__context *s)
5901 {
5902    int r = (stbi__get32be(s) == 0x38425053);
5903    stbi__rewind(s);
5904    return r;
5905 }
5906
5907 static int stbi__psd_decode_rle(stbi__context *s, stbi_uc *p, int pixelCount)
5908 {
5909    int count, nleft, len;
5910
5911    count = 0;
5912    while ((nleft = pixelCount - count) > 0) {
5913       len = stbi__get8(s);
5914       if (len == 128) {
5915          // No-op.
5916       } else if (len < 128) {
5917          // Copy next len+1 bytes literally.
5918          len++;
5919          if (len > nleft) return 0; // corrupt data
5920          count += len;
5921          while (len) {
5922             *p = stbi__get8(s);
5923             p += 4;
5924             len--;
5925          }
5926       } else if (len > 128) {
5927          stbi_uc   val;
5928          // Next -len+1 bytes in the dest are replicated from next source byte.
5929          // (Interpret len as a negative 8-bit int.)
5930          len = 257 - len;
5931          if (len > nleft) return 0; // corrupt data
5932          val = stbi__get8(s);
5933          count += len;
5934          while (len) {
5935             *p = val;
5936             p += 4;
5937             len--;
5938          }
5939       }
5940    }
5941
5942    return 1;
5943 }
5944
5945 static void *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri, int bpc)
5946 {
5947    int pixelCount;
5948    int channelCount, compression;
5949    int channel, i;
5950    int bitdepth;
5951    int w,h;
5952    stbi_uc *out;
5953    STBI_NOTUSED(ri);
5954
5955    // Check identifier
5956    if (stbi__get32be(s) != 0x38425053)   // "8BPS"
5957       return stbi__errpuc("not PSD", "Corrupt PSD image");
5958
5959    // Check file type version.
5960    if (stbi__get16be(s) != 1)
5961       return stbi__errpuc("wrong version", "Unsupported version of PSD image");
5962
5963    // Skip 6 reserved bytes.
5964    stbi__skip(s, 6 );
5965
5966    // Read the number of channels (R, G, B, A, etc).
5967    channelCount = stbi__get16be(s);
5968    if (channelCount < 0 || channelCount > 16)
5969       return stbi__errpuc("wrong channel count", "Unsupported number of channels in PSD image");
5970
5971    // Read the rows and columns of the image.
5972    h = stbi__get32be(s);
5973    w = stbi__get32be(s);
5974
5975    if (h > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
5976    if (w > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
5977
5978    // Make sure the depth is 8 bits.
5979    bitdepth = stbi__get16be(s);
5980    if (bitdepth != 8 && bitdepth != 16)
5981       return stbi__errpuc("unsupported bit depth", "PSD bit depth is not 8 or 16 bit");
5982
5983    // Make sure the color mode is RGB.
5984    // Valid options are:
5985    //   0: Bitmap
5986    //   1: Grayscale
5987    //   2: Indexed color
5988    //   3: RGB color
5989    //   4: CMYK color
5990    //   7: Multichannel
5991    //   8: Duotone
5992    //   9: Lab color
5993    if (stbi__get16be(s) != 3)
5994       return stbi__errpuc("wrong color format", "PSD is not in RGB color format");
5995
5996    // Skip the Mode Data.  (It's the palette for indexed color; other info for other modes.)
5997    stbi__skip(s,stbi__get32be(s) );
5998
5999    // Skip the image resources.  (resolution, pen tool paths, etc)
6000    stbi__skip(s, stbi__get32be(s) );
6001
6002    // Skip the reserved data.
6003    stbi__skip(s, stbi__get32be(s) );
6004
6005    // Find out if the data is compressed.
6006    // Known values:
6007    //   0: no compression
6008    //   1: RLE compressed
6009    compression = stbi__get16be(s);
6010    if (compression > 1)
6011       return stbi__errpuc("bad compression", "PSD has an unknown compression format");
6012
6013    // Check size
6014    if (!stbi__mad3sizes_valid(4, w, h, 0))
6015       return stbi__errpuc("too large", "Corrupt PSD");
6016
6017    // Create the destination image.
6018
6019    if (!compression && bitdepth == 16 && bpc == 16) {
6020       out = (stbi_uc *) stbi__malloc_mad3(8, w, h, 0);
6021       ri->bits_per_channel = 16;
6022    } else
6023       out = (stbi_uc *) stbi__malloc(4 * w*h);
6024
6025    if (!out) return stbi__errpuc("outofmem", "Out of memory");
6026    pixelCount = w*h;
6027
6028    // Initialize the data to zero.
6029    //memset( out, 0, pixelCount * 4 );
6030
6031    // Finally, the image data.
6032    if (compression) {
6033       // RLE as used by .PSD and .TIFF
6034       // Loop until you get the number of unpacked bytes you are expecting:
6035       //     Read the next source byte into n.
6036       //     If n is between 0 and 127 inclusive, copy the next n+1 bytes literally.
6037       //     Else if n is between -127 and -1 inclusive, copy the next byte -n+1 times.
6038       //     Else if n is 128, noop.
6039       // Endloop
6040
6041       // The RLE-compressed data is preceded by a 2-byte data count for each row in the data,
6042       // which we're going to just skip.
6043       stbi__skip(s, h * channelCount * 2 );
6044
6045       // Read the RLE data by channel.
6046       for (channel = 0; channel < 4; channel++) {
6047          stbi_uc *p;
6048
6049          p = out+channel;
6050          if (channel >= channelCount) {
6051             // Fill this channel with default data.
6052             for (i = 0; i < pixelCount; i++, p += 4)
6053                *p = (channel == 3 ? 255 : 0);
6054          } else {
6055             // Read the RLE data.
6056             if (!stbi__psd_decode_rle(s, p, pixelCount)) {
6057                STBI_FREE(out);
6058                return stbi__errpuc("corrupt", "bad RLE data");
6059             }
6060          }
6061       }
6062
6063    } else {
6064       // We're at the raw image data.  It's each channel in order (Red, Green, Blue, Alpha, ...)
6065       // where each channel consists of an 8-bit (or 16-bit) value for each pixel in the image.
6066
6067       // Read the data by channel.
6068       for (channel = 0; channel < 4; channel++) {
6069          if (channel >= channelCount) {
6070             // Fill this channel with default data.
6071             if (bitdepth == 16 && bpc == 16) {
6072                stbi__uint16 *q = ((stbi__uint16 *) out) + channel;
6073                stbi__uint16 val = channel == 3 ? 65535 : 0;
6074                for (i = 0; i < pixelCount; i++, q += 4)
6075                   *q = val;
6076             } else {
6077                stbi_uc *p = out+channel;
6078                stbi_uc val = channel == 3 ? 255 : 0;
6079                for (i = 0; i < pixelCount; i++, p += 4)
6080                   *p = val;
6081             }
6082          } else {
6083             if (ri->bits_per_channel == 16) {    // output bpc
6084                stbi__uint16 *q = ((stbi__uint16 *) out) + channel;
6085                for (i = 0; i < pixelCount; i++, q += 4)
6086                   *q = (stbi__uint16) stbi__get16be(s);
6087             } else {
6088                stbi_uc *p = out+channel;
6089                if (bitdepth == 16) {  // input bpc
6090                   for (i = 0; i < pixelCount; i++, p += 4)
6091                      *p = (stbi_uc) (stbi__get16be(s) >> 8);
6092                } else {
6093                   for (i = 0; i < pixelCount; i++, p += 4)
6094                      *p = stbi__get8(s);
6095                }
6096             }
6097          }
6098       }
6099    }
6100
6101    // remove weird white matte from PSD
6102    if (channelCount >= 4) {
6103       if (ri->bits_per_channel == 16) {
6104          for (i=0; i < w*h; ++i) {
6105             stbi__uint16 *pixel = (stbi__uint16 *) out + 4*i;
6106             if (pixel[3] != 0 && pixel[3] != 65535) {
6107                float a = pixel[3] / 65535.0f;
6108                float ra = 1.0f / a;
6109                float inv_a = 65535.0f * (1 - ra);
6110                pixel[0] = (stbi__uint16) (pixel[0]*ra + inv_a);
6111                pixel[1] = (stbi__uint16) (pixel[1]*ra + inv_a);
6112                pixel[2] = (stbi__uint16) (pixel[2]*ra + inv_a);
6113             }
6114          }
6115       } else {
6116          for (i=0; i < w*h; ++i) {
6117             unsigned char *pixel = out + 4*i;
6118             if (pixel[3] != 0 && pixel[3] != 255) {
6119                float a = pixel[3] / 255.0f;
6120                float ra = 1.0f / a;
6121                float inv_a = 255.0f * (1 - ra);
6122                pixel[0] = (unsigned char) (pixel[0]*ra + inv_a);
6123                pixel[1] = (unsigned char) (pixel[1]*ra + inv_a);
6124                pixel[2] = (unsigned char) (pixel[2]*ra + inv_a);
6125             }
6126          }
6127       }
6128    }
6129
6130    // convert to desired output format
6131    if (req_comp && req_comp != 4) {
6132       if (ri->bits_per_channel == 16)
6133          out = (stbi_uc *) stbi__convert_format16((stbi__uint16 *) out, 4, req_comp, w, h);
6134       else
6135          out = stbi__convert_format(out, 4, req_comp, w, h);
6136       if (out == NULL) return out; // stbi__convert_format frees input on failure
6137    }
6138
6139    if (comp) *comp = 4;
6140    *y = h;
6141    *x = w;
6142
6143    return out;
6144 }
6145 #endif
6146
6147 // *************************************************************************************************
6148 // Softimage PIC loader
6149 // by Tom Seddon
6150 //
6151 // See http://softimage.wiki.softimage.com/index.php/INFO:_PIC_file_format
6152 // See http://ozviz.wasp.uwa.edu.au/~pbourke/dataformats/softimagepic/
6153
6154 #ifndef STBI_NO_PIC
6155 static int stbi__pic_is4(stbi__context *s,const char *str)
6156 {
6157    int i;
6158    for (i=0; i<4; ++i)
6159       if (stbi__get8(s) != (stbi_uc)str[i])
6160          return 0;
6161
6162    return 1;
6163 }
6164
6165 static int stbi__pic_test_core(stbi__context *s)
6166 {
6167    int i;
6168
6169    if (!stbi__pic_is4(s,"\x53\x80\xF6\x34"))
6170       return 0;
6171
6172    for(i=0;i<84;++i)
6173       stbi__get8(s);
6174
6175    if (!stbi__pic_is4(s,"PICT"))
6176       return 0;
6177
6178    return 1;
6179 }
6180
6181 typedef struct
6182 {
6183    stbi_uc size,type,channel;
6184 } stbi__pic_packet;
6185
6186 static stbi_uc *stbi__readval(stbi__context *s, int channel, stbi_uc *dest)
6187 {
6188    int mask=0x80, i;
6189
6190    for (i=0; i<4; ++i, mask>>=1) {
6191       if (channel & mask) {
6192          if (stbi__at_eof(s)) return stbi__errpuc("bad file","PIC file too short");
6193          dest[i]=stbi__get8(s);
6194       }
6195    }
6196
6197    return dest;
6198 }
6199
6200 static void stbi__copyval(int channel,stbi_uc *dest,const stbi_uc *src)
6201 {
6202    int mask=0x80,i;
6203
6204    for (i=0;i<4; ++i, mask>>=1)
6205       if (channel&mask)
6206          dest[i]=src[i];
6207 }
6208
6209 static stbi_uc *stbi__pic_load_core(stbi__context *s,int width,int height,int *comp, stbi_uc *result)
6210 {
6211    int act_comp=0,num_packets=0,y,chained;
6212    stbi__pic_packet packets[10];
6213
6214    // this will (should...) cater for even some bizarre stuff like having data
6215     // for the same channel in multiple packets.
6216    do {
6217       stbi__pic_packet *packet;
6218
6219       if (num_packets==sizeof(packets)/sizeof(packets[0]))
6220          return stbi__errpuc("bad format","too many packets");
6221
6222       packet = &packets[num_packets++];
6223
6224       chained = stbi__get8(s);
6225       packet->size    = stbi__get8(s);
6226       packet->type    = stbi__get8(s);
6227       packet->channel = stbi__get8(s);
6228
6229       act_comp |= packet->channel;
6230
6231       if (stbi__at_eof(s))          return stbi__errpuc("bad file","file too short (reading packets)");
6232       if (packet->size != 8)  return stbi__errpuc("bad format","packet isn't 8bpp");
6233    } while (chained);
6234
6235    *comp = (act_comp & 0x10 ? 4 : 3); // has alpha channel?
6236
6237    for(y=0; y<height; ++y) {
6238       int packet_idx;
6239
6240       for(packet_idx=0; packet_idx < num_packets; ++packet_idx) {
6241          stbi__pic_packet *packet = &packets[packet_idx];
6242          stbi_uc *dest = result+y*width*4;
6243
6244          switch (packet->type) {
6245             default:
6246                return stbi__errpuc("bad format","packet has bad compression type");
6247
6248             case 0: {//uncompressed
6249                int x;
6250
6251                for(x=0;x<width;++x, dest+=4)
6252                   if (!stbi__readval(s,packet->channel,dest))
6253                      return 0;
6254                break;
6255             }
6256
6257             case 1://Pure RLE
6258                {
6259                   int left=width, i;
6260
6261                   while (left>0) {
6262                      stbi_uc count,value[4];
6263
6264                      count=stbi__get8(s);
6265                      if (stbi__at_eof(s))   return stbi__errpuc("bad file","file too short (pure read count)");
6266
6267                      if (count > left)
6268                         count = (stbi_uc) left;
6269
6270                      if (!stbi__readval(s,packet->channel,value))  return 0;
6271
6272                      for(i=0; i<count; ++i,dest+=4)
6273                         stbi__copyval(packet->channel,dest,value);
6274                      left -= count;
6275                   }
6276                }
6277                break;
6278
6279             case 2: {//Mixed RLE
6280                int left=width;
6281                while (left>0) {
6282                   int count = stbi__get8(s), i;
6283                   if (stbi__at_eof(s))  return stbi__errpuc("bad file","file too short (mixed read count)");
6284
6285                   if (count >= 128) { // Repeated
6286                      stbi_uc value[4];
6287
6288                      if (count==128)
6289                         count = stbi__get16be(s);
6290                      else
6291                         count -= 127;
6292                      if (count > left)
6293                         return stbi__errpuc("bad file","scanline overrun");
6294
6295                      if (!stbi__readval(s,packet->channel,value))
6296                         return 0;
6297
6298                      for(i=0;i<count;++i, dest += 4)
6299                         stbi__copyval(packet->channel,dest,value);
6300                   } else { // Raw
6301                      ++count;
6302                      if (count>left) return stbi__errpuc("bad file","scanline overrun");
6303
6304                      for(i=0;i<count;++i, dest+=4)
6305                         if (!stbi__readval(s,packet->channel,dest))
6306                            return 0;
6307                   }
6308                   left-=count;
6309                }
6310                break;
6311             }
6312          }
6313       }
6314    }
6315
6316    return result;
6317 }
6318
6319 static void *stbi__pic_load(stbi__context *s,int *px,int *py,int *comp,int req_comp, stbi__result_info *ri)
6320 {
6321    stbi_uc *result;
6322    int i, x,y, internal_comp;
6323    STBI_NOTUSED(ri);
6324
6325    if (!comp) comp = &internal_comp;
6326
6327    for (i=0; i<92; ++i)
6328       stbi__get8(s);
6329
6330    x = stbi__get16be(s);
6331    y = stbi__get16be(s);
6332
6333    if (y > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
6334    if (x > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
6335
6336    if (stbi__at_eof(s))  return stbi__errpuc("bad file","file too short (pic header)");
6337    if (!stbi__mad3sizes_valid(x, y, 4, 0)) return stbi__errpuc("too large", "PIC image too large to decode");
6338
6339    stbi__get32be(s); //skip `ratio'
6340    stbi__get16be(s); //skip `fields'
6341    stbi__get16be(s); //skip `pad'
6342
6343    // intermediate buffer is RGBA
6344    result = (stbi_uc *) stbi__malloc_mad3(x, y, 4, 0);
6345    memset(result, 0xff, x*y*4);
6346
6347    if (!stbi__pic_load_core(s,x,y,comp, result)) {
6348       STBI_FREE(result);
6349       result=0;
6350    }
6351    *px = x;
6352    *py = y;
6353    if (req_comp == 0) req_comp = *comp;
6354    result=stbi__convert_format(result,4,req_comp,x,y);
6355
6356    return result;
6357 }
6358
6359 static int stbi__pic_test(stbi__context *s)
6360 {
6361    int r = stbi__pic_test_core(s);
6362    stbi__rewind(s);
6363    return r;
6364 }
6365 #endif
6366
6367 // *************************************************************************************************
6368 // GIF loader -- public domain by Jean-Marc Lienher -- simplified/shrunk by stb
6369
6370 #ifndef STBI_NO_GIF
6371 typedef struct
6372 {
6373    stbi__int16 prefix;
6374    stbi_uc first;
6375    stbi_uc suffix;
6376 } stbi__gif_lzw;
6377
6378 typedef struct
6379 {
6380    int w,h;
6381    stbi_uc *out;                 // output buffer (always 4 components)
6382    stbi_uc *background;          // The current "background" as far as a gif is concerned
6383    stbi_uc *history;
6384    int flags, bgindex, ratio, transparent, eflags;
6385    stbi_uc  pal[256][4];
6386    stbi_uc lpal[256][4];
6387    stbi__gif_lzw codes[8192];
6388    stbi_uc *color_table;
6389    int parse, step;
6390    int lflags;
6391    int start_x, start_y;
6392    int max_x, max_y;
6393    int cur_x, cur_y;
6394    int line_size;
6395    int delay;
6396 } stbi__gif;
6397
6398 static int stbi__gif_test_raw(stbi__context *s)
6399 {
6400    int sz;
6401    if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8') return 0;
6402    sz = stbi__get8(s);
6403    if (sz != '9' && sz != '7') return 0;
6404    if (stbi__get8(s) != 'a') return 0;
6405    return 1;
6406 }
6407
6408 static int stbi__gif_test(stbi__context *s)
6409 {
6410    int r = stbi__gif_test_raw(s);
6411    stbi__rewind(s);
6412    return r;
6413 }
6414
6415 static void stbi__gif_parse_colortable(stbi__context *s, stbi_uc pal[256][4], int num_entries, int transp)
6416 {
6417    int i;
6418    for (i=0; i < num_entries; ++i) {
6419       pal[i][2] = stbi__get8(s);
6420       pal[i][1] = stbi__get8(s);
6421       pal[i][0] = stbi__get8(s);
6422       pal[i][3] = transp == i ? 0 : 255;
6423    }
6424 }
6425
6426 static int stbi__gif_header(stbi__context *s, stbi__gif *g, int *comp, int is_info)
6427 {
6428    stbi_uc version;
6429    if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8')
6430       return stbi__err("not GIF", "Corrupt GIF");
6431
6432    version = stbi__get8(s);
6433    if (version != '7' && version != '9')    return stbi__err("not GIF", "Corrupt GIF");
6434    if (stbi__get8(s) != 'a')                return stbi__err("not GIF", "Corrupt GIF");
6435
6436    stbi__g_failure_reason = "";
6437    g->w = stbi__get16le(s);
6438    g->h = stbi__get16le(s);
6439    g->flags = stbi__get8(s);
6440    g->bgindex = stbi__get8(s);
6441    g->ratio = stbi__get8(s);
6442    g->transparent = -1;
6443
6444    if (g->w > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
6445    if (g->h > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
6446
6447    if (comp != 0) *comp = 4;  // can't actually tell whether it's 3 or 4 until we parse the comments
6448
6449    if (is_info) return 1;
6450
6451    if (g->flags & 0x80)
6452       stbi__gif_parse_colortable(s,g->pal, 2 << (g->flags & 7), -1);
6453
6454    return 1;
6455 }
6456
6457 static int stbi__gif_info_raw(stbi__context *s, int *x, int *y, int *comp)
6458 {
6459    stbi__gif* g = (stbi__gif*) stbi__malloc(sizeof(stbi__gif));
6460    if (!stbi__gif_header(s, g, comp, 1)) {
6461       STBI_FREE(g);
6462       stbi__rewind( s );
6463       return 0;
6464    }
6465    if (x) *x = g->w;
6466    if (y) *y = g->h;
6467    STBI_FREE(g);
6468    return 1;
6469 }
6470
6471 static void stbi__out_gif_code(stbi__gif *g, stbi__uint16 code)
6472 {
6473    stbi_uc *p, *c;
6474    int idx;
6475
6476    // recurse to decode the prefixes, since the linked-list is backwards,
6477    // and working backwards through an interleaved image would be nasty
6478    if (g->codes[code].prefix >= 0)
6479       stbi__out_gif_code(g, g->codes[code].prefix);
6480
6481    if (g->cur_y >= g->max_y) return;
6482
6483    idx = g->cur_x + g->cur_y;
6484    p = &g->out[idx];
6485    g->history[idx / 4] = 1;
6486
6487    c = &g->color_table[g->codes[code].suffix * 4];
6488    if (c[3] > 128) { // don't render transparent pixels;
6489       p[0] = c[2];
6490       p[1] = c[1];
6491       p[2] = c[0];
6492       p[3] = c[3];
6493    }
6494    g->cur_x += 4;
6495
6496    if (g->cur_x >= g->max_x) {
6497       g->cur_x = g->start_x;
6498       g->cur_y += g->step;
6499
6500       while (g->cur_y >= g->max_y && g->parse > 0) {
6501          g->step = (1 << g->parse) * g->line_size;
6502          g->cur_y = g->start_y + (g->step >> 1);
6503          --g->parse;
6504       }
6505    }
6506 }
6507
6508 static stbi_uc *stbi__process_gif_raster(stbi__context *s, stbi__gif *g)
6509 {
6510    stbi_uc lzw_cs;
6511    stbi__int32 len, init_code;
6512    stbi__uint32 first;
6513    stbi__int32 codesize, codemask, avail, oldcode, bits, valid_bits, clear;
6514    stbi__gif_lzw *p;
6515
6516    lzw_cs = stbi__get8(s);
6517    if (lzw_cs > 12) return NULL;
6518    clear = 1 << lzw_cs;
6519    first = 1;
6520    codesize = lzw_cs + 1;
6521    codemask = (1 << codesize) - 1;
6522    bits = 0;
6523    valid_bits = 0;
6524    for (init_code = 0; init_code < clear; init_code++) {
6525       g->codes[init_code].prefix = -1;
6526       g->codes[init_code].first = (stbi_uc) init_code;
6527       g->codes[init_code].suffix = (stbi_uc) init_code;
6528    }
6529
6530    // support no starting clear code
6531    avail = clear+2;
6532    oldcode = -1;
6533
6534    len = 0;
6535    for(;;) {
6536       if (valid_bits < codesize) {
6537          if (len == 0) {
6538             len = stbi__get8(s); // start new block
6539             if (len == 0)
6540                return g->out;
6541          }
6542          --len;
6543          bits |= (stbi__int32) stbi__get8(s) << valid_bits;
6544          valid_bits += 8;
6545       } else {
6546          stbi__int32 code = bits & codemask;
6547          bits >>= codesize;
6548          valid_bits -= codesize;
6549          // @OPTIMIZE: is there some way we can accelerate the non-clear path?
6550          if (code == clear) {  // clear code
6551             codesize = lzw_cs + 1;
6552             codemask = (1 << codesize) - 1;
6553             avail = clear + 2;
6554             oldcode = -1;
6555             first = 0;
6556          } else if (code == clear + 1) { // end of stream code
6557             stbi__skip(s, len);
6558             while ((len = stbi__get8(s)) > 0)
6559                stbi__skip(s,len);
6560             return g->out;
6561          } else if (code <= avail) {
6562             if (first) {
6563                return stbi__errpuc("no clear code", "Corrupt GIF");
6564             }
6565
6566             if (oldcode >= 0) {
6567                p = &g->codes[avail++];
6568                if (avail > 8192) {
6569                   return stbi__errpuc("too many codes", "Corrupt GIF");
6570                }
6571
6572                p->prefix = (stbi__int16) oldcode;
6573                p->first = g->codes[oldcode].first;
6574                p->suffix = (code == avail) ? p->first : g->codes[code].first;
6575             } else if (code == avail)
6576                return stbi__errpuc("illegal code in raster", "Corrupt GIF");
6577
6578             stbi__out_gif_code(g, (stbi__uint16) code);
6579
6580             if ((avail & codemask) == 0 && avail <= 0x0FFF) {
6581                codesize++;
6582                codemask = (1 << codesize) - 1;
6583             }
6584
6585             oldcode = code;
6586          } else {
6587             return stbi__errpuc("illegal code in raster", "Corrupt GIF");
6588          }
6589       }
6590    }
6591 }
6592
6593 // this function is designed to support animated gifs, although stb_image doesn't support it
6594 // two back is the image from two frames ago, used for a very specific disposal format
6595 static stbi_uc *stbi__gif_load_next(stbi__context *s, stbi__gif *g, int *comp, int req_comp, stbi_uc *two_back)
6596 {
6597    int dispose;
6598    int first_frame;
6599    int pi;
6600    int pcount;
6601    STBI_NOTUSED(req_comp);
6602
6603    // on first frame, any non-written pixels get the background colour (non-transparent)
6604    first_frame = 0;
6605    if (g->out == 0) {
6606       if (!stbi__gif_header(s, g, comp,0)) return 0; // stbi__g_failure_reason set by stbi__gif_header
6607       if (!stbi__mad3sizes_valid(4, g->w, g->h, 0))
6608          return stbi__errpuc("too large", "GIF image is too large");
6609       pcount = g->w * g->h;
6610       g->out = (stbi_uc *) stbi__malloc(4 * pcount);
6611       g->background = (stbi_uc *) stbi__malloc(4 * pcount);
6612       g->history = (stbi_uc *) stbi__malloc(pcount);
6613       if (!g->out || !g->background || !g->history)
6614          return stbi__errpuc("outofmem", "Out of memory");
6615
6616       // image is treated as "transparent" at the start - ie, nothing overwrites the current background;
6617       // background colour is only used for pixels that are not rendered first frame, after that "background"
6618       // color refers to the color that was there the previous frame.
6619       memset(g->out, 0x00, 4 * pcount);
6620       memset(g->background, 0x00, 4 * pcount); // state of the background (starts transparent)
6621       memset(g->history, 0x00, pcount);        // pixels that were affected previous frame
6622       first_frame = 1;
6623    } else {
6624       // second frame - how do we dispose of the previous one?
6625       dispose = (g->eflags & 0x1C) >> 2;
6626       pcount = g->w * g->h;
6627
6628       if ((dispose == 3) && (two_back == 0)) {
6629          dispose = 2; // if I don't have an image to revert back to, default to the old background
6630       }
6631
6632       if (dispose == 3) { // use previous graphic
6633          for (pi = 0; pi < pcount; ++pi) {
6634             if (g->history[pi]) {
6635                memcpy( &g->out[pi * 4], &two_back[pi * 4], 4 );
6636             }
6637          }
6638       } else if (dispose == 2) {
6639          // restore what was changed last frame to background before that frame;
6640          for (pi = 0; pi < pcount; ++pi) {
6641             if (g->history[pi]) {
6642                memcpy( &g->out[pi * 4], &g->background[pi * 4], 4 );
6643             }
6644          }
6645       } else {
6646          // This is a non-disposal case eithe way, so just
6647          // leave the pixels as is, and they will become the new background
6648          // 1: do not dispose
6649          // 0:  not specified.
6650       }
6651
6652       // background is what out is after the undoing of the previou frame;
6653       memcpy( g->background, g->out, 4 * g->w * g->h );
6654    }
6655
6656    // clear my history;
6657    memset( g->history, 0x00, g->w * g->h );        // pixels that were affected previous frame
6658
6659    for (;;) {
6660       int tag = stbi__get8(s);
6661       switch (tag) {
6662          case 0x2C: /* Image Descriptor */
6663          {
6664             stbi__int32 x, y, w, h;
6665             stbi_uc *o;
6666
6667             x = stbi__get16le(s);
6668             y = stbi__get16le(s);
6669             w = stbi__get16le(s);
6670             h = stbi__get16le(s);
6671             if (((x + w) > (g->w)) || ((y + h) > (g->h)))
6672                return stbi__errpuc("bad Image Descriptor", "Corrupt GIF");
6673
6674             g->line_size = g->w * 4;
6675             g->start_x = x * 4;
6676             g->start_y = y * g->line_size;
6677             g->max_x   = g->start_x + w * 4;
6678             g->max_y   = g->start_y + h * g->line_size;
6679             g->cur_x   = g->start_x;
6680             g->cur_y   = g->start_y;
6681
6682             // if the width of the specified rectangle is 0, that means
6683             // we may not see *any* pixels or the image is malformed;
6684             // to make sure this is caught, move the current y down to
6685             // max_y (which is what out_gif_code checks).
6686             if (w == 0)
6687                g->cur_y = g->max_y;
6688
6689             g->lflags = stbi__get8(s);
6690
6691             if (g->lflags & 0x40) {
6692                g->step = 8 * g->line_size; // first interlaced spacing
6693                g->parse = 3;
6694             } else {
6695                g->step = g->line_size;
6696                g->parse = 0;
6697             }
6698
6699             if (g->lflags & 0x80) {
6700                stbi__gif_parse_colortable(s,g->lpal, 2 << (g->lflags & 7), g->eflags & 0x01 ? g->transparent : -1);
6701                g->color_table = (stbi_uc *) g->lpal;
6702             } else if (g->flags & 0x80) {
6703                g->color_table = (stbi_uc *) g->pal;
6704             } else
6705                return stbi__errpuc("missing color table", "Corrupt GIF");
6706
6707             o = stbi__process_gif_raster(s, g);
6708             if (!o) return NULL;
6709
6710             // if this was the first frame,
6711             pcount = g->w * g->h;
6712             if (first_frame && (g->bgindex > 0)) {
6713                // if first frame, any pixel not drawn to gets the background color
6714                for (pi = 0; pi < pcount; ++pi) {
6715                   if (g->history[pi] == 0) {
6716                      g->pal[g->bgindex][3] = 255; // just in case it was made transparent, undo that; It will be reset next frame if need be;
6717                      memcpy( &g->out[pi * 4], &g->pal[g->bgindex], 4 );
6718                   }
6719                }
6720             }
6721
6722             return o;
6723          }
6724
6725          case 0x21: // Comment Extension.
6726          {
6727             int len;
6728             int ext = stbi__get8(s);
6729             if (ext == 0xF9) { // Graphic Control Extension.
6730                len = stbi__get8(s);
6731                if (len == 4) {
6732                   g->eflags = stbi__get8(s);
6733                   g->delay = 10 * stbi__get16le(s); // delay - 1/100th of a second, saving as 1/1000ths.
6734
6735                   // unset old transparent
6736                   if (g->transparent >= 0) {
6737                      g->pal[g->transparent][3] = 255;
6738                   }
6739                   if (g->eflags & 0x01) {
6740                      g->transparent = stbi__get8(s);
6741                      if (g->transparent >= 0) {
6742                         g->pal[g->transparent][3] = 0;
6743                      }
6744                   } else {
6745                      // don't need transparent
6746                      stbi__skip(s, 1);
6747                      g->transparent = -1;
6748                   }
6749                } else {
6750                   stbi__skip(s, len);
6751                   break;
6752                }
6753             }
6754             while ((len = stbi__get8(s)) != 0) {
6755                stbi__skip(s, len);
6756             }
6757             break;
6758          }
6759
6760          case 0x3B: // gif stream termination code
6761             return (stbi_uc *) s; // using '1' causes warning on some compilers
6762
6763          default:
6764             return stbi__errpuc("unknown code", "Corrupt GIF");
6765       }
6766    }
6767 }
6768
6769 static void *stbi__load_gif_main(stbi__context *s, int **delays, int *x, int *y, int *z, int *comp, int req_comp)
6770 {
6771    if (stbi__gif_test(s)) {
6772       int layers = 0;
6773       stbi_uc *u = 0;
6774       stbi_uc *out = 0;
6775       stbi_uc *two_back = 0;
6776       stbi__gif g;
6777       int stride;
6778       int out_size = 0;
6779       int delays_size = 0;
6780       memset(&g, 0, sizeof(g));
6781       if (delays) {
6782          *delays = 0;
6783       }
6784
6785       do {
6786          u = stbi__gif_load_next(s, &g, comp, req_comp, two_back);
6787          if (u == (stbi_uc *) s) u = 0;  // end of animated gif marker
6788
6789          if (u) {
6790             *x = g.w;
6791             *y = g.h;
6792             ++layers;
6793             stride = g.w * g.h * 4;
6794
6795             if (out) {
6796                void *tmp = (stbi_uc*) STBI_REALLOC_SIZED( out, out_size, layers * stride );
6797                if (NULL == tmp) {
6798                   STBI_FREE(g.out);
6799                   STBI_FREE(g.history);
6800                   STBI_FREE(g.background);
6801                   return stbi__errpuc("outofmem", "Out of memory");
6802                }
6803                else {
6804                    out = (stbi_uc*) tmp;
6805                    out_size = layers * stride;
6806                }
6807
6808                if (delays) {
6809                   *delays = (int*) STBI_REALLOC_SIZED( *delays, delays_size, sizeof(int) * layers );
6810                   delays_size = layers * sizeof(int);
6811                }
6812             } else {
6813                out = (stbi_uc*)stbi__malloc( layers * stride );
6814                out_size = layers * stride;
6815                if (delays) {
6816                   *delays = (int*) stbi__malloc( layers * sizeof(int) );
6817                   delays_size = layers * sizeof(int);
6818                }
6819             }
6820             memcpy( out + ((layers - 1) * stride), u, stride );
6821             if (layers >= 2) {
6822                two_back = out - 2 * stride;
6823             }
6824
6825             if (delays) {
6826                (*delays)[layers - 1U] = g.delay;
6827             }
6828          }
6829       } while (u != 0);
6830
6831       // free temp buffer;
6832       STBI_FREE(g.out);
6833       STBI_FREE(g.history);
6834       STBI_FREE(g.background);
6835
6836       // do the final conversion after loading everything;
6837       if (req_comp && req_comp != 4)
6838          out = stbi__convert_format(out, 4, req_comp, layers * g.w, g.h);
6839
6840       *z = layers;
6841       return out;
6842    } else {
6843       return stbi__errpuc("not GIF", "Image was not as a gif type.");
6844    }
6845 }
6846
6847 static void *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
6848 {
6849    stbi_uc *u = 0;
6850    stbi__gif g;
6851    memset(&g, 0, sizeof(g));
6852    STBI_NOTUSED(ri);
6853
6854    u = stbi__gif_load_next(s, &g, comp, req_comp, 0);
6855    if (u == (stbi_uc *) s) u = 0;  // end of animated gif marker
6856    if (u) {
6857       *x = g.w;
6858       *y = g.h;
6859
6860       // moved conversion to after successful load so that the same
6861       // can be done for multiple frames.
6862       if (req_comp && req_comp != 4)
6863          u = stbi__convert_format(u, 4, req_comp, g.w, g.h);
6864    } else if (g.out) {
6865       // if there was an error and we allocated an image buffer, free it!
6866       STBI_FREE(g.out);
6867    }
6868
6869    // free buffers needed for multiple frame loading;
6870    STBI_FREE(g.history);
6871    STBI_FREE(g.background);
6872
6873    return u;
6874 }
6875
6876 static int stbi__gif_info(stbi__context *s, int *x, int *y, int *comp)
6877 {
6878    return stbi__gif_info_raw(s,x,y,comp);
6879 }
6880 #endif
6881
6882 // *************************************************************************************************
6883 // Radiance RGBE HDR loader
6884 // originally by Nicolas Schulz
6885 #ifndef STBI_NO_HDR
6886 static int stbi__hdr_test_core(stbi__context *s, const char *signature)
6887 {
6888    int i;
6889    for (i=0; signature[i]; ++i)
6890       if (stbi__get8(s) != signature[i])
6891           return 0;
6892    stbi__rewind(s);
6893    return 1;
6894 }
6895
6896 static int stbi__hdr_test(stbi__context* s)
6897 {
6898    int r = stbi__hdr_test_core(s, "#?RADIANCE\n");
6899    stbi__rewind(s);
6900    if(!r) {
6901        r = stbi__hdr_test_core(s, "#?RGBE\n");
6902        stbi__rewind(s);
6903    }
6904    return r;
6905 }
6906
6907 #define STBI__HDR_BUFLEN  1024
6908 static char *stbi__hdr_gettoken(stbi__context *z, char *buffer)
6909 {
6910    int len=0;
6911    char c = '\0';
6912
6913    c = (char) stbi__get8(z);
6914
6915    while (!stbi__at_eof(z) && c != '\n') {
6916       buffer[len++] = c;
6917       if (len == STBI__HDR_BUFLEN-1) {
6918          // flush to end of line
6919          while (!stbi__at_eof(z) && stbi__get8(z) != '\n')
6920             ;
6921          break;
6922       }
6923       c = (char) stbi__get8(z);
6924    }
6925
6926    buffer[len] = 0;
6927    return buffer;
6928 }
6929
6930 static void stbi__hdr_convert(float *output, stbi_uc *input, int req_comp)
6931 {
6932    if ( input[3] != 0 ) {
6933       float f1;
6934       // Exponent
6935       f1 = (float) ldexp(1.0f, input[3] - (int)(128 + 8));
6936       if (req_comp <= 2)
6937          output[0] = (input[0] + input[1] + input[2]) * f1 / 3;
6938       else {
6939          output[0] = input[0] * f1;
6940          output[1] = input[1] * f1;
6941          output[2] = input[2] * f1;
6942       }
6943       if (req_comp == 2) output[1] = 1;
6944       if (req_comp == 4) output[3] = 1;
6945    } else {
6946       switch (req_comp) {
6947          case 4: output[3] = 1; /* fallthrough */
6948          case 3: output[0] = output[1] = output[2] = 0;
6949                  break;
6950          case 2: output[1] = 1; /* fallthrough */
6951          case 1: output[0] = 0;
6952                  break;
6953       }
6954    }
6955 }
6956
6957 static float *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
6958 {
6959    char buffer[STBI__HDR_BUFLEN];
6960    char *token;
6961    int valid = 0;
6962    int width, height;
6963    stbi_uc *scanline;
6964    float *hdr_data;
6965    int len;
6966    unsigned char count, value;
6967    int i, j, k, c1,c2, z;
6968    const char *headerToken;
6969    STBI_NOTUSED(ri);
6970
6971    // Check identifier
6972    headerToken = stbi__hdr_gettoken(s,buffer);
6973    if (strcmp(headerToken, "#?RADIANCE") != 0 && strcmp(headerToken, "#?RGBE") != 0)
6974       return stbi__errpf("not HDR", "Corrupt HDR image");
6975
6976    // Parse header
6977    for(;;) {
6978       token = stbi__hdr_gettoken(s,buffer);
6979       if (token[0] == 0) break;
6980       if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
6981    }
6982
6983    if (!valid)    return stbi__errpf("unsupported format", "Unsupported HDR format");
6984
6985    // Parse width and height
6986    // can't use sscanf() if we're not using stdio!
6987    token = stbi__hdr_gettoken(s,buffer);
6988    if (strncmp(token, "-Y ", 3))  return stbi__errpf("unsupported data layout", "Unsupported HDR format");
6989    token += 3;
6990    height = (int) strtol(token, &token, 10);
6991    while (*token == ' ') ++token;
6992    if (strncmp(token, "+X ", 3))  return stbi__errpf("unsupported data layout", "Unsupported HDR format");
6993    token += 3;
6994    width = (int) strtol(token, NULL, 10);
6995
6996    if (height > STBI_MAX_DIMENSIONS) return stbi__errpf("too large","Very large image (corrupt?)");
6997    if (width > STBI_MAX_DIMENSIONS) return stbi__errpf("too large","Very large image (corrupt?)");
6998
6999    *x = width;
7000    *y = height;
7001
7002    if (comp) *comp = 3;
7003    if (req_comp == 0) req_comp = 3;
7004
7005    if (!stbi__mad4sizes_valid(width, height, req_comp, sizeof(float), 0))
7006       return stbi__errpf("too large", "HDR image is too large");
7007
7008    // Read data
7009    hdr_data = (float *) stbi__malloc_mad4(width, height, req_comp, sizeof(float), 0);
7010    if (!hdr_data)
7011       return stbi__errpf("outofmem", "Out of memory");
7012
7013    // Load image data
7014    // image data is stored as some number of sca
7015    if ( width < 8 || width >= 32768) {
7016       // Read flat data
7017       for (j=0; j < height; ++j) {
7018          for (i=0; i < width; ++i) {
7019             stbi_uc rgbe[4];
7020            main_decode_loop:
7021             stbi__getn(s, rgbe, 4);
7022             stbi__hdr_convert(hdr_data + j * width * req_comp + i * req_comp, rgbe, req_comp);
7023          }
7024       }
7025    } else {
7026       // Read RLE-encoded data
7027       scanline = NULL;
7028
7029       for (j = 0; j < height; ++j) {
7030          c1 = stbi__get8(s);
7031          c2 = stbi__get8(s);
7032          len = stbi__get8(s);
7033          if (c1 != 2 || c2 != 2 || (len & 0x80)) {
7034             // not run-length encoded, so we have to actually use THIS data as a decoded
7035             // pixel (note this can't be a valid pixel--one of RGB must be >= 128)
7036             stbi_uc rgbe[4];
7037             rgbe[0] = (stbi_uc) c1;
7038             rgbe[1] = (stbi_uc) c2;
7039             rgbe[2] = (stbi_uc) len;
7040             rgbe[3] = (stbi_uc) stbi__get8(s);
7041             stbi__hdr_convert(hdr_data, rgbe, req_comp);
7042             i = 1;
7043             j = 0;
7044             STBI_FREE(scanline);
7045             goto main_decode_loop; // yes, this makes no sense
7046          }
7047          len <<= 8;
7048          len |= stbi__get8(s);
7049          if (len != width) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("invalid decoded scanline length", "corrupt HDR"); }
7050          if (scanline == NULL) {
7051             scanline = (stbi_uc *) stbi__malloc_mad2(width, 4, 0);
7052             if (!scanline) {
7053                STBI_FREE(hdr_data);
7054                return stbi__errpf("outofmem", "Out of memory");
7055             }
7056          }
7057
7058          for (k = 0; k < 4; ++k) {
7059             int nleft;
7060             i = 0;
7061             while ((nleft = width - i) > 0) {
7062                count = stbi__get8(s);
7063                if (count > 128) {
7064                   // Run
7065                   value = stbi__get8(s);
7066                   count -= 128;
7067                   if (count > nleft) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("corrupt", "bad RLE data in HDR"); }
7068                   for (z = 0; z < count; ++z)
7069                      scanline[i++ * 4 + k] = value;
7070                } else {
7071                   // Dump
7072                   if (count > nleft) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("corrupt", "bad RLE data in HDR"); }
7073                   for (z = 0; z < count; ++z)
7074                      scanline[i++ * 4 + k] = stbi__get8(s);
7075                }
7076             }
7077          }
7078          for (i=0; i < width; ++i)
7079             stbi__hdr_convert(hdr_data+(j*width + i)*req_comp, scanline + i*4, req_comp);
7080       }
7081       if (scanline)
7082          STBI_FREE(scanline);
7083    }
7084
7085    return hdr_data;
7086 }
7087
7088 static int stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp)
7089 {
7090    char buffer[STBI__HDR_BUFLEN];
7091    char *token;
7092    int valid = 0;
7093    int dummy;
7094
7095    if (!x) x = &dummy;
7096    if (!y) y = &dummy;
7097    if (!comp) comp = &dummy;
7098
7099    if (stbi__hdr_test(s) == 0) {
7100        stbi__rewind( s );
7101        return 0;
7102    }
7103
7104    for(;;) {
7105       token = stbi__hdr_gettoken(s,buffer);
7106       if (token[0] == 0) break;
7107       if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
7108    }
7109
7110    if (!valid) {
7111        stbi__rewind( s );
7112        return 0;
7113    }
7114    token = stbi__hdr_gettoken(s,buffer);
7115    if (strncmp(token, "-Y ", 3)) {
7116        stbi__rewind( s );
7117        return 0;
7118    }
7119    token += 3;
7120    *y = (int) strtol(token, &token, 10);
7121    while (*token == ' ') ++token;
7122    if (strncmp(token, "+X ", 3)) {
7123        stbi__rewind( s );
7124        return 0;
7125    }
7126    token += 3;
7127    *x = (int) strtol(token, NULL, 10);
7128    *comp = 3;
7129    return 1;
7130 }
7131 #endif // STBI_NO_HDR
7132
7133 #ifndef STBI_NO_BMP
7134 static int stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp)
7135 {
7136    void *p;
7137    stbi__bmp_data info;
7138
7139    info.all_a = 255;
7140    p = stbi__bmp_parse_header(s, &info);
7141    stbi__rewind( s );
7142    if (p == NULL)
7143       return 0;
7144    if (x) *x = s->img_x;
7145    if (y) *y = s->img_y;
7146    if (comp) {
7147       if (info.bpp == 24 && info.ma == 0xff000000)
7148          *comp = 3;
7149       else
7150          *comp = info.ma ? 4 : 3;
7151    }
7152    return 1;
7153 }
7154 #endif
7155
7156 #ifndef STBI_NO_PSD
7157 static int stbi__psd_info(stbi__context *s, int *x, int *y, int *comp)
7158 {
7159    int channelCount, dummy, depth;
7160    if (!x) x = &dummy;
7161    if (!y) y = &dummy;
7162    if (!comp) comp = &dummy;
7163    if (stbi__get32be(s) != 0x38425053) {
7164        stbi__rewind( s );
7165        return 0;
7166    }
7167    if (stbi__get16be(s) != 1) {
7168        stbi__rewind( s );
7169        return 0;
7170    }
7171    stbi__skip(s, 6);
7172    channelCount = stbi__get16be(s);
7173    if (channelCount < 0 || channelCount > 16) {
7174        stbi__rewind( s );
7175        return 0;
7176    }
7177    *y = stbi__get32be(s);
7178    *x = stbi__get32be(s);
7179    depth = stbi__get16be(s);
7180    if (depth != 8 && depth != 16) {
7181        stbi__rewind( s );
7182        return 0;
7183    }
7184    if (stbi__get16be(s) != 3) {
7185        stbi__rewind( s );
7186        return 0;
7187    }
7188    *comp = 4;
7189    return 1;
7190 }
7191
7192 static int stbi__psd_is16(stbi__context *s)
7193 {
7194    int channelCount, depth;
7195    if (stbi__get32be(s) != 0x38425053) {
7196        stbi__rewind( s );
7197        return 0;
7198    }
7199    if (stbi__get16be(s) != 1) {
7200        stbi__rewind( s );
7201        return 0;
7202    }
7203    stbi__skip(s, 6);
7204    channelCount = stbi__get16be(s);
7205    if (channelCount < 0 || channelCount > 16) {
7206        stbi__rewind( s );
7207        return 0;
7208    }
7209    (void) stbi__get32be(s);
7210    (void) stbi__get32be(s);
7211    depth = stbi__get16be(s);
7212    if (depth != 16) {
7213        stbi__rewind( s );
7214        return 0;
7215    }
7216    return 1;
7217 }
7218 #endif
7219
7220 #ifndef STBI_NO_PIC
7221 static int stbi__pic_info(stbi__context *s, int *x, int *y, int *comp)
7222 {
7223    int act_comp=0,num_packets=0,chained,dummy;
7224    stbi__pic_packet packets[10];
7225
7226    if (!x) x = &dummy;
7227    if (!y) y = &dummy;
7228    if (!comp) comp = &dummy;
7229
7230    if (!stbi__pic_is4(s,"\x53\x80\xF6\x34")) {
7231       stbi__rewind(s);
7232       return 0;
7233    }
7234
7235    stbi__skip(s, 88);
7236
7237    *x = stbi__get16be(s);
7238    *y = stbi__get16be(s);
7239    if (stbi__at_eof(s)) {
7240       stbi__rewind( s);
7241       return 0;
7242    }
7243    if ( (*x) != 0 && (1 << 28) / (*x) < (*y)) {
7244       stbi__rewind( s );
7245       return 0;
7246    }
7247
7248    stbi__skip(s, 8);
7249
7250    do {
7251       stbi__pic_packet *packet;
7252
7253       if (num_packets==sizeof(packets)/sizeof(packets[0]))
7254          return 0;
7255
7256       packet = &packets[num_packets++];
7257       chained = stbi__get8(s);
7258       packet->size    = stbi__get8(s);
7259       packet->type    = stbi__get8(s);
7260       packet->channel = stbi__get8(s);
7261       act_comp |= packet->channel;
7262
7263       if (stbi__at_eof(s)) {
7264           stbi__rewind( s );
7265           return 0;
7266       }
7267       if (packet->size != 8) {
7268           stbi__rewind( s );
7269           return 0;
7270       }
7271    } while (chained);
7272
7273    *comp = (act_comp & 0x10 ? 4 : 3);
7274
7275    return 1;
7276 }
7277 #endif
7278
7279 // *************************************************************************************************
7280 // Portable Gray Map and Portable Pixel Map loader
7281 // by Ken Miller
7282 //
7283 // PGM: http://netpbm.sourceforge.net/doc/pgm.html
7284 // PPM: http://netpbm.sourceforge.net/doc/ppm.html
7285 //
7286 // Known limitations:
7287 //    Does not support comments in the header section
7288 //    Does not support ASCII image data (formats P2 and P3)
7289 //    Does not support 16-bit-per-channel
7290
7291 #ifndef STBI_NO_PNM
7292
7293 static int      stbi__pnm_test(stbi__context *s)
7294 {
7295    char p, t;
7296    p = (char) stbi__get8(s);
7297    t = (char) stbi__get8(s);
7298    if (p != 'P' || (t != '5' && t != '6')) {
7299        stbi__rewind( s );
7300        return 0;
7301    }
7302    return 1;
7303 }
7304
7305 static void *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
7306 {
7307    stbi_uc *out;
7308    STBI_NOTUSED(ri);
7309
7310    if (!stbi__pnm_info(s, (int *)&s->img_x, (int *)&s->img_y, (int *)&s->img_n))
7311       return 0;
7312
7313    if (s->img_y > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
7314    if (s->img_x > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
7315
7316    *x = s->img_x;
7317    *y = s->img_y;
7318    if (comp) *comp = s->img_n;
7319
7320    if (!stbi__mad3sizes_valid(s->img_n, s->img_x, s->img_y, 0))
7321       return stbi__errpuc("too large", "PNM too large");
7322
7323    out = (stbi_uc *) stbi__malloc_mad3(s->img_n, s->img_x, s->img_y, 0);
7324    if (!out) return stbi__errpuc("outofmem", "Out of memory");
7325    stbi__getn(s, out, s->img_n * s->img_x * s->img_y);
7326
7327    if (req_comp && req_comp != s->img_n) {
7328       out = stbi__convert_format(out, s->img_n, req_comp, s->img_x, s->img_y);
7329       if (out == NULL) return out; // stbi__convert_format frees input on failure
7330    }
7331    return out;
7332 }
7333
7334 static int      stbi__pnm_isspace(char c)
7335 {
7336    return c == ' ' || c == '\t' || c == '\n' || c == '\v' || c == '\f' || c == '\r';
7337 }
7338
7339 static void     stbi__pnm_skip_whitespace(stbi__context *s, char *c)
7340 {
7341    for (;;) {
7342       while (!stbi__at_eof(s) && stbi__pnm_isspace(*c))
7343          *c = (char) stbi__get8(s);
7344
7345       if (stbi__at_eof(s) || *c != '#')
7346          break;
7347
7348       while (!stbi__at_eof(s) && *c != '\n' && *c != '\r' )
7349          *c = (char) stbi__get8(s);
7350    }
7351 }
7352
7353 static int      stbi__pnm_isdigit(char c)
7354 {
7355    return c >= '0' && c <= '9';
7356 }
7357
7358 static int      stbi__pnm_getinteger(stbi__context *s, char *c)
7359 {
7360    int value = 0;
7361
7362    while (!stbi__at_eof(s) && stbi__pnm_isdigit(*c)) {
7363       value = value*10 + (*c - '0');
7364       *c = (char) stbi__get8(s);
7365    }
7366
7367    return value;
7368 }
7369
7370 static int      stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp)
7371 {
7372    int maxv, dummy;
7373    char c, p, t;
7374
7375    if (!x) x = &dummy;
7376    if (!y) y = &dummy;
7377    if (!comp) comp = &dummy;
7378
7379    stbi__rewind(s);
7380
7381    // Get identifier
7382    p = (char) stbi__get8(s);
7383    t = (char) stbi__get8(s);
7384    if (p != 'P' || (t != '5' && t != '6')) {
7385        stbi__rewind(s);
7386        return 0;
7387    }
7388
7389    *comp = (t == '6') ? 3 : 1;  // '5' is 1-component .pgm; '6' is 3-component .ppm
7390
7391    c = (char) stbi__get8(s);
7392    stbi__pnm_skip_whitespace(s, &c);
7393
7394    *x = stbi__pnm_getinteger(s, &c); // read width
7395    stbi__pnm_skip_whitespace(s, &c);
7396
7397    *y = stbi__pnm_getinteger(s, &c); // read height
7398    stbi__pnm_skip_whitespace(s, &c);
7399
7400    maxv = stbi__pnm_getinteger(s, &c);  // read max value
7401
7402    if (maxv > 255)
7403       return stbi__err("max value > 255", "PPM image not 8-bit");
7404    else
7405       return 1;
7406 }
7407 #endif
7408
7409 static int stbi__info_main(stbi__context *s, int *x, int *y, int *comp)
7410 {
7411    #ifndef STBI_NO_JPEG
7412    if (stbi__jpeg_info(s, x, y, comp)) return 1;
7413    #endif
7414
7415    #ifndef STBI_NO_PNG
7416    if (stbi__png_info(s, x, y, comp))  return 1;
7417    #endif
7418
7419    #ifndef STBI_NO_GIF
7420    if (stbi__gif_info(s, x, y, comp))  return 1;
7421    #endif
7422
7423    #ifndef STBI_NO_BMP
7424    if (stbi__bmp_info(s, x, y, comp))  return 1;
7425    #endif
7426
7427    #ifndef STBI_NO_PSD
7428    if (stbi__psd_info(s, x, y, comp))  return 1;
7429    #endif
7430
7431    #ifndef STBI_NO_PIC
7432    if (stbi__pic_info(s, x, y, comp))  return 1;
7433    #endif
7434
7435    #ifndef STBI_NO_PNM
7436    if (stbi__pnm_info(s, x, y, comp))  return 1;
7437    #endif
7438
7439    #ifndef STBI_NO_HDR
7440    if (stbi__hdr_info(s, x, y, comp))  return 1;
7441    #endif
7442
7443    // test tga last because it's a crappy test!
7444    #ifndef STBI_NO_TGA
7445    if (stbi__tga_info(s, x, y, comp))
7446        return 1;
7447    #endif
7448    return stbi__err("unknown image type", "Image not of any known type, or corrupt");
7449 }
7450
7451 static int stbi__is_16_main(stbi__context *s)
7452 {
7453    #ifndef STBI_NO_PNG
7454    if (stbi__png_is16(s))  return 1;
7455    #endif
7456
7457    #ifndef STBI_NO_PSD
7458    if (stbi__psd_is16(s))  return 1;
7459    #endif
7460
7461    return 0;
7462 }
7463
7464 #ifndef STBI_NO_STDIO
7465 STBIDEF int stbi_info(char const *filename, int *x, int *y, int *comp)
7466 {
7467     FILE *f = stbi__fopen(filename, "rb");
7468     int result;
7469     if (!f) return stbi__err("can't fopen", "Unable to open file");
7470     result = stbi_info_from_file(f, x, y, comp);
7471     fclose(f);
7472     return result;
7473 }
7474
7475 STBIDEF int stbi_info_from_file(FILE *f, int *x, int *y, int *comp)
7476 {
7477    int r;
7478    stbi__context s;
7479    long pos = ftell(f);
7480    stbi__start_file(&s, f);
7481    r = stbi__info_main(&s,x,y,comp);
7482    fseek(f,pos,SEEK_SET);
7483    return r;
7484 }
7485
7486 STBIDEF int stbi_is_16_bit(char const *filename)
7487 {
7488     FILE *f = stbi__fopen(filename, "rb");
7489     int result;
7490     if (!f) return stbi__err("can't fopen", "Unable to open file");
7491     result = stbi_is_16_bit_from_file(f);
7492     fclose(f);
7493     return result;
7494 }
7495
7496 STBIDEF int stbi_is_16_bit_from_file(FILE *f)
7497 {
7498    int r;
7499    stbi__context s;
7500    long pos = ftell(f);
7501    stbi__start_file(&s, f);
7502    r = stbi__is_16_main(&s);
7503    fseek(f,pos,SEEK_SET);
7504    return r;
7505 }
7506 #endif // !STBI_NO_STDIO
7507
7508 STBIDEF int stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp)
7509 {
7510    stbi__context s;
7511    stbi__start_mem(&s,buffer,len);
7512    return stbi__info_main(&s,x,y,comp);
7513 }
7514
7515 STBIDEF int stbi_info_from_callbacks(stbi_io_callbacks const *c, void *user, int *x, int *y, int *comp)
7516 {
7517    stbi__context s;
7518    stbi__start_callbacks(&s, (stbi_io_callbacks *) c, user);
7519    return stbi__info_main(&s,x,y,comp);
7520 }
7521
7522 STBIDEF int stbi_is_16_bit_from_memory(stbi_uc const *buffer, int len)
7523 {
7524    stbi__context s;
7525    stbi__start_mem(&s,buffer,len);
7526    return stbi__is_16_main(&s);
7527 }
7528
7529 STBIDEF int stbi_is_16_bit_from_callbacks(stbi_io_callbacks const *c, void *user)
7530 {
7531    stbi__context s;
7532    stbi__start_callbacks(&s, (stbi_io_callbacks *) c, user);
7533    return stbi__is_16_main(&s);
7534 }
7535
7536 #endif // STB_IMAGE_IMPLEMENTATION
7537
7538 /*
7539    revision history:
7540       2.20  (2019-02-07) support utf8 filenames in Windows; fix warnings and platform ifdefs
7541       2.19  (2018-02-11) fix warning
7542       2.18  (2018-01-30) fix warnings
7543       2.17  (2018-01-29) change sbti__shiftsigned to avoid clang -O2 bug
7544                          1-bit BMP
7545                          *_is_16_bit api
7546                          avoid warnings
7547       2.16  (2017-07-23) all functions have 16-bit variants;
7548                          STBI_NO_STDIO works again;
7549                          compilation fixes;
7550                          fix rounding in unpremultiply;
7551                          optimize vertical flip;
7552                          disable raw_len validation;
7553                          documentation fixes
7554       2.15  (2017-03-18) fix png-1,2,4 bug; now all Imagenet JPGs decode;
7555                          warning fixes; disable run-time SSE detection on gcc;
7556                          uniform handling of optional "return" values;
7557                          thread-safe initialization of zlib tables
7558       2.14  (2017-03-03) remove deprecated STBI_JPEG_OLD; fixes for Imagenet JPGs
7559       2.13  (2016-11-29) add 16-bit API, only supported for PNG right now
7560       2.12  (2016-04-02) fix typo in 2.11 PSD fix that caused crashes
7561       2.11  (2016-04-02) allocate large structures on the stack
7562                          remove white matting for transparent PSD
7563                          fix reported channel count for PNG & BMP
7564                          re-enable SSE2 in non-gcc 64-bit
7565                          support RGB-formatted JPEG
7566                          read 16-bit PNGs (only as 8-bit)
7567       2.10  (2016-01-22) avoid warning introduced in 2.09 by STBI_REALLOC_SIZED
7568       2.09  (2016-01-16) allow comments in PNM files
7569                          16-bit-per-pixel TGA (not bit-per-component)
7570                          info() for TGA could break due to .hdr handling
7571                          info() for BMP to shares code instead of sloppy parse
7572                          can use STBI_REALLOC_SIZED if allocator doesn't support realloc
7573                          code cleanup
7574       2.08  (2015-09-13) fix to 2.07 cleanup, reading RGB PSD as RGBA
7575       2.07  (2015-09-13) fix compiler warnings
7576                          partial animated GIF support
7577                          limited 16-bpc PSD support
7578                          #ifdef unused functions
7579                          bug with < 92 byte PIC,PNM,HDR,TGA
7580       2.06  (2015-04-19) fix bug where PSD returns wrong '*comp' value
7581       2.05  (2015-04-19) fix bug in progressive JPEG handling, fix warning
7582       2.04  (2015-04-15) try to re-enable SIMD on MinGW 64-bit
7583       2.03  (2015-04-12) extra corruption checking (mmozeiko)
7584                          stbi_set_flip_vertically_on_load (nguillemot)
7585                          fix NEON support; fix mingw support
7586       2.02  (2015-01-19) fix incorrect assert, fix warning
7587       2.01  (2015-01-17) fix various warnings; suppress SIMD on gcc 32-bit without -msse2
7588       2.00b (2014-12-25) fix STBI_MALLOC in progressive JPEG
7589       2.00  (2014-12-25) optimize JPG, including x86 SSE2 & NEON SIMD (ryg)
7590                          progressive JPEG (stb)
7591                          PGM/PPM support (Ken Miller)
7592                          STBI_MALLOC,STBI_REALLOC,STBI_FREE
7593                          GIF bugfix -- seemingly never worked
7594                          STBI_NO_*, STBI_ONLY_*
7595       1.48  (2014-12-14) fix incorrectly-named assert()
7596       1.47  (2014-12-14) 1/2/4-bit PNG support, both direct and paletted (Omar Cornut & stb)
7597                          optimize PNG (ryg)
7598                          fix bug in interlaced PNG with user-specified channel count (stb)
7599       1.46  (2014-08-26)
7600               fix broken tRNS chunk (colorkey-style transparency) in non-paletted PNG
7601       1.45  (2014-08-16)
7602               fix MSVC-ARM internal compiler error by wrapping malloc
7603       1.44  (2014-08-07)
7604               various warning fixes from Ronny Chevalier
7605       1.43  (2014-07-15)
7606               fix MSVC-only compiler problem in code changed in 1.42
7607       1.42  (2014-07-09)
7608               don't define _CRT_SECURE_NO_WARNINGS (affects user code)
7609               fixes to stbi__cleanup_jpeg path
7610               added STBI_ASSERT to avoid requiring assert.h
7611       1.41  (2014-06-25)
7612               fix search&replace from 1.36 that messed up comments/error messages
7613       1.40  (2014-06-22)
7614               fix gcc struct-initialization warning
7615       1.39  (2014-06-15)
7616               fix to TGA optimization when req_comp != number of components in TGA;
7617               fix to GIF loading because BMP wasn't rewinding (whoops, no GIFs in my test suite)
7618               add support for BMP version 5 (more ignored fields)
7619       1.38  (2014-06-06)
7620               suppress MSVC warnings on integer casts truncating values
7621               fix accidental rename of 'skip' field of I/O
7622       1.37  (2014-06-04)
7623               remove duplicate typedef
7624       1.36  (2014-06-03)
7625               convert to header file single-file library
7626               if de-iphone isn't set, load iphone images color-swapped instead of returning NULL
7627       1.35  (2014-05-27)
7628               various warnings
7629               fix broken STBI_SIMD path
7630               fix bug where stbi_load_from_file no longer left file pointer in correct place
7631               fix broken non-easy path for 32-bit BMP (possibly never used)
7632               TGA optimization by Arseny Kapoulkine
7633       1.34  (unknown)
7634               use STBI_NOTUSED in stbi__resample_row_generic(), fix one more leak in tga failure case
7635       1.33  (2011-07-14)
7636               make stbi_is_hdr work in STBI_NO_HDR (as specified), minor compiler-friendly improvements
7637       1.32  (2011-07-13)
7638               support for "info" function for all supported filetypes (SpartanJ)
7639       1.31  (2011-06-20)
7640               a few more leak fixes, bug in PNG handling (SpartanJ)
7641       1.30  (2011-06-11)
7642               added ability to load files via callbacks to accomidate custom input streams (Ben Wenger)
7643               removed deprecated format-specific test/load functions
7644               removed support for installable file formats (stbi_loader) -- would have been broken for IO callbacks anyway
7645               error cases in bmp and tga give messages and don't leak (Raymond Barbiero, grisha)
7646               fix inefficiency in decoding 32-bit BMP (David Woo)
7647       1.29  (2010-08-16)
7648               various warning fixes from Aurelien Pocheville
7649       1.28  (2010-08-01)
7650               fix bug in GIF palette transparency (SpartanJ)
7651       1.27  (2010-08-01)
7652               cast-to-stbi_uc to fix warnings
7653       1.26  (2010-07-24)
7654               fix bug in file buffering for PNG reported by SpartanJ
7655       1.25  (2010-07-17)
7656               refix trans_data warning (Won Chun)
7657       1.24  (2010-07-12)
7658               perf improvements reading from files on platforms with lock-heavy fgetc()
7659               minor perf improvements for jpeg
7660               deprecated type-specific functions so we'll get feedback if they're needed
7661               attempt to fix trans_data warning (Won Chun)
7662       1.23    fixed bug in iPhone support
7663       1.22  (2010-07-10)
7664               removed image *writing* support
7665               stbi_info support from Jetro Lauha
7666               GIF support from Jean-Marc Lienher
7667               iPhone PNG-extensions from James Brown
7668               warning-fixes from Nicolas Schulz and Janez Zemva (i.stbi__err. Janez (U+017D)emva)
7669       1.21    fix use of 'stbi_uc' in header (reported by jon blow)
7670       1.20    added support for Softimage PIC, by Tom Seddon
7671       1.19    bug in interlaced PNG corruption check (found by ryg)
7672       1.18  (2008-08-02)
7673               fix a threading bug (local mutable static)
7674       1.17    support interlaced PNG
7675       1.16    major bugfix - stbi__convert_format converted one too many pixels
7676       1.15    initialize some fields for thread safety
7677       1.14    fix threadsafe conversion bug
7678               header-file-only version (#define STBI_HEADER_FILE_ONLY before including)
7679       1.13    threadsafe
7680       1.12    const qualifiers in the API
7681       1.11    Support installable IDCT, colorspace conversion routines
7682       1.10    Fixes for 64-bit (don't use "unsigned long")
7683               optimized upsampling by Fabian "ryg" Giesen
7684       1.09    Fix format-conversion for PSD code (bad global variables!)
7685       1.08    Thatcher Ulrich's PSD code integrated by Nicolas Schulz
7686       1.07    attempt to fix C++ warning/errors again
7687       1.06    attempt to fix C++ warning/errors again
7688       1.05    fix TGA loading to return correct *comp and use good luminance calc
7689       1.04    default float alpha is 1, not 255; use 'void *' for stbi_image_free
7690       1.03    bugfixes to STBI_NO_STDIO, STBI_NO_HDR
7691       1.02    support for (subset of) HDR files, float interface for preferred access to them
7692       1.01    fix bug: possible bug in handling right-side up bmps... not sure
7693               fix bug: the stbi__bmp_load() and stbi__tga_load() functions didn't work at all
7694       1.00    interface to zlib that skips zlib header
7695       0.99    correct handling of alpha in palette
7696       0.98    TGA loader by lonesock; dynamically add loaders (untested)
7697       0.97    jpeg errors on too large a file; also catch another malloc failure
7698       0.96    fix detection of invalid v value - particleman@mollyrocket forum
7699       0.95    during header scan, seek to markers in case of padding
7700       0.94    STBI_NO_STDIO to disable stdio usage; rename all #defines the same
7701       0.93    handle jpegtran output; verbose errors
7702       0.92    read 4,8,16,24,32-bit BMP files of several formats
7703       0.91    output 24-bit Windows 3.0 BMP files
7704       0.90    fix a few more warnings; bump version number to approach 1.0
7705       0.61    bugfixes due to Marc LeBlanc, Christopher Lloyd
7706       0.60    fix compiling as c++
7707       0.59    fix warnings: merge Dave Moore's -Wall fixes
7708       0.58    fix bug: zlib uncompressed mode len/nlen was wrong endian
7709       0.57    fix bug: jpg last huffman symbol before marker was >9 bits but less than 16 available
7710       0.56    fix bug: zlib uncompressed mode len vs. nlen
7711       0.55    fix bug: restart_interval not initialized to 0
7712       0.54    allow NULL for 'int *comp'
7713       0.53    fix bug in png 3->4; speedup png decoding
7714       0.52    png handles req_comp=3,4 directly; minor cleanup; jpeg comments
7715       0.51    obey req_comp requests, 1-component jpegs return as 1-component,
7716               on 'test' only check type, not whether we support this variant
7717       0.50  (2006-11-19)
7718               first released version
7719 */
7720
7721
7722 /*
7723 ------------------------------------------------------------------------------
7724 This software is available under 2 licenses -- choose whichever you prefer.
7725 ------------------------------------------------------------------------------
7726 ALTERNATIVE A - MIT License
7727 Copyright (c) 2017 Sean Barrett
7728 Permission is hereby granted, free of charge, to any person obtaining a copy of
7729 this software and associated documentation files (the "Software"), to deal in
7730 the Software without restriction, including without limitation the rights to
7731 use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
7732 of the Software, and to permit persons to whom the Software is furnished to do
7733 so, subject to the following conditions:
7734 The above copyright notice and this permission notice shall be included in all
7735 copies or substantial portions of the Software.
7736 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
7737 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
7738 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
7739 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
7740 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
7741 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
7742 SOFTWARE.
7743 ------------------------------------------------------------------------------
7744 ALTERNATIVE B - Public Domain (www.unlicense.org)
7745 This is free and unencumbered software released into the public domain.
7746 Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
7747 software, either in source code form or as a compiled binary, for any purpose,
7748 commercial or non-commercial, and by any means.
7749 In jurisdictions that recognize copyright laws, the author or authors of this
7750 software dedicate any and all copyright interest in the software to the public
7751 domain. We make this dedication for the benefit of the public at large and to
7752 the detriment of our heirs and successors. We intend this dedication to be an
7753 overt act of relinquishment in perpetuity of all present and future rights to
7754 this software under copyright law.
7755 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
7756 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
7757 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
7758 AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
7759 ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
7760 WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
7761 ------------------------------------------------------------------------------
7762 */