LZMA compression explained Read more

by Gautier de Montmollin

This summer vacation's project was completed almost on schedule: write a LZMA encoder, whilst enjoying vacation - that is, work early in the morning and late in the evening when everybody else is sleeping; and have fun (bike, canoe, visiting caves and amazing dinosaurs fac-similes, enjoying special beers, ...) the rest of the day.

Well, "schedule" is a bit overstretched, because with a topic as tricky as data compression, it is difficult to tell when and even whether you will succeed...

LZMA is a compression format invented by Igor Pavlov, which combines a LZ77 compression and range encoding.

With LZ77, imagine you are copying a text, character by character, but want to take some shortcuts. You send either single characters, or a pair of numbers (distance, length) meaning "please copy 'length' characters, starting back 'distance' characters in the copied text, from the point where the cursor is right now". That's it!
LZ77 is a well covered subject and is the first stage of most compression algorithms. Basically you can pick and choose an implementation, depending on the final compression size.

Range encoding is a fascinating way of compressing a message of any nature. Say you want to send a very large number N, but with less digits. It's possible - if some of the digits (0 to 9), appear more frequently, and some, less. The method is the following.
You begin with a range, say [0, 999[.
You subdivide it in ten intervals, corresponding to the digits 0 to 9, and calibrated depending on their probability of occurrence, p0 .. p9. The first digit of N is perhaps 3, and its corresponding interval is, say, [295, 405[.
Then, you continue with the second digit by subdividing [295, 405[ in ten intervals. If the second digit is 0, you have perhaps now [295, 306[, representing the partial message "30". You see, of course, that if you want to stick with integers (with computers you don't have infinite precision anyway), you lose quickly precision when you set up the ten intervals with the probabilities p0 .. p9. The solution is to append from time to time a 0 to the interval, when the width is too small. So, if you decide to multiply everything by 10 each time the width is less than 100, then the interval for "30" will be now [2950, 3060[.
Some n digits to be encoded later (after n subdivisions and some x10 when needed) your interval will perhaps look like [298056312, 298056701[. The bounds become larger and larger - second problem. Solution: you see that the leftmost digits won't change anymore. You can get rid of them and send them as a chunk of the compressed message. The compression will be better when symbols are much more frequent than others: the closer the probability is to 1, the more the range width will be preserved. If the probability was exacly 1, the width wouldn't change at all and this trivial message with only the same symbol wouln't take any space in its compressed form! It is an absurd case, but it shows why compression methods such as LZMA are extremely good for very redundant data.
That's how the basic range encoding works.
Then, a funny thing is that you can encode a mix of different alphabets (say digits '0' to '9' and letters 'A' to 'Z') or even the same alphabet, but with different probabilities depending on the context, provided the decoder knows what to use when. That's all for range encoding (you find a more detailed description in the original article [1]).

LZMA's range encoder works exclusively on a single, binary alphabet (0's and 1's), so the range is always divided in two parts. But it works with lots of contextual probabilities. With some parameters you can have millions of different probabilities in the model! The probabilities are not known in advance, so in this respect LZMA is a purely adaptive compression method: the encoder and the decoder adapt the probabilities as the symbols are sent and received. After each bit encoded, sent, received, decoded, the entire probability set is (and has to be) exactly in the same state by the encoder and by the decoder.

Developing an encoder from scratch, even if you have open-source code to reproduce, is fun, but debugging it is a pain. A bug feels like when something doesn't work in a PhD work in maths. No way to get help from anybody or by browsing the Web. By nature, the compressed data will not contain any redundancy that would help you fixing bugs. The decoder is confused on faulty compressed data and cannot say why. For range encoding, it is worse: as in the example, digits sent have nothing to do with the message to be encoded. The interval subdivision, the shipping of the leading interval digits, and the appending of trailing '0', occur in a way which is completely asynchronous. So, the good tactic is, as elsewhere, to simplify and divide the issues to the simplest.
First, manage to encode an empty message (wow!). It seems trivial, but the range encoder works like a pipeline; you need to initialize it and flush it correctly. Then, an empty message and the end-of-stream marker. And so on.
Another source of help for LZMA is the probability set: it needs to be identical at every point as said before.

The results of this effort in a few numbers:
  • LZMA.Encoding, started July 28th, first working version August 16th (revision 457).
  • Less than 450 lines - including lots of comments and some debugging code to be removed!
  • 5 bugs had to be fixed.

To my (of course biased) opinion, this is the first LZMA encoder that a normal human can understand by reading the source code.

Zip-Ada's Zip.Compress makes use of LZMA encoding since revision 459.

The source code is available here (main SourceForge SVN repository) or here (GitHub mirror).

Back to vacation topic (which is what you do often when you're back from vacation): a tourist info sign was just perfect for a 32x32 pixels "info" icon for the AZip archive manager.

Click to enlarge
The beautiful sign

By the way, some other things are beautiful in this town (St-Ursanne at the Doubs river)...



____
[1] G. N. N. Martin, Range encoding: an algorithm for removing redundancy
   from a digitized message, Video & Data Recording Conference,
   Southampton, UK, July 24-27, 1979.


Playing with limited-length Huffman trees Read more

by Gautier de Montmollin

A small portion of a Huffman tree
Proper counting is crucial

On the second image you can recognize "DIZAINES" and "UNITÉS". The letters are hardly visible after some 90 years of use... It's a pool table in the mythic Schlauch Restaurant in Zurich.

Back to Huffman trees: just came across a beautiful algorithm [1] for building a Huffman tree with, as a constraint, a limited code length. With the original Huffman algorithm, parts of the tree may become arbitrarily long - well, obviously not longer that the alphabet to be encoded. This algorithm is optimal (given the constraint), fast and very economical in terms of used memory.

There is an equally beautiful implementation [2], translated for the purpose of the Zip-Ada project - a proper dynamic Deflate for compression is still missing, and a limited-length Huffman tree building algorithm is needed for that.
Translation is there: specification, body, test procedure, and abstract enough to build with an Ada 83 compiler (at least GNAT in -gnat83 mode)!
___
[1] Search: "A Fast and Space-Economical Algorithm for Length-Limited Coding"
[2] Search: "katajainen.c" (this part of the "Zopfli" project)

Deflating... Read more

by Gautier de Montmollin

...not in the economical sense fortunately, except that it's all about economy of bits.

Following last post about limited-length Huffman trees, there is now the very early development of a compression algorithm for the so-called "Dynamic Deflate" compression format. From revision #297 the code (checkout here) seems to correctly compress data (correctly means that the data is correctly decoded, even by buggy but popular decoders...).
The efficiciency of the compression is in the works. Stay tuned!

Testing is welcome: build and use the ZipAda command-line tool, or use the Deflate_1 method in your code using the Zip-Ada library.


Visual Deflating Read more

by Gautier de Montmollin

Picture
Compressed as bitmap with the Deflate format with dynamically computed Huffman trees, you may get this when dumping the bit length for each code (columns) with successive chunks of the image (rows):

Literals / LZ Lengths / LZ Distances
Current research with the Zip-Ada project.

Zip-Ada v.50 Read more

by Gautier de Montmollin

There is a new version of Zip-Ada @ http://unzip-ada.sf.net .

*** 

In a nutshell, there are now, finally, fast *and* efficient compression methods available.

* Changes in '50', 31-Mar-2016:
  - Zip.Compress.Shrink is slightly faster
  - Zip.Compress.Deflate has new compression features:
     - Deflate_Fixed is much faster, with slightly better compression
     - Deflate_1 was added: strength similar to zlib, level 6
     - Deflate_2 was added: strength similar to zlib, level 9
     - Deflate_3 was added: strength similar to 7-Zip, method=deflate, level 5

I use the term "similar" because the compression strength depends on the algorithms used and on the data, so it may differ from case to case. In the following charts, we have a comparison on the two most known benchmark data set ("corpora"), where the similarity with zlib (=info-zip, prefix iz_ below) holds, but not at all with 7-Zip-with-Deflate.
In blue, you see non-Deflate formats (BZip2 and LZMA), just to remind that the world doesn't stop with Deflate, although it's the topic in this article.
In green, you have Zip archives made by Zip-Ada.

Click to enlarge image
Click to enlarge image

Here is the biggest surprise I've had by testing randomly chosen data: a 162MB sparse integer matrix (among a bunch of results for a Kaggle challenge) which is a very redundant data. First, 7-Zip in Deflate mode gives a comparatively poor compression ratio - don't worry for 7-Zip, the LZMA mode, genuine to 7-Zip, is second best in the list. The most surprising aspect is that the Shrink format (LZW algorithm) has a compressed size only 5.6% larger than the best Deflate (here, KZip).

Click to enlarge image

Typically the penalty for LZW (used for GIF images) is from 25% to 100% compared to the best Deflate (used for PNG images). Of course, at the other end of redundancy spectrum, data which are closer to random are also more difficult to compress and the differences between LZW and Deflate narrow forcefully.

About Deflate

As you perhaps know, the Deflate format, invented around 1989 by the late Phil Katz for his PKZip program, performs compression in two steps by combining a LZ77 algorithm with Huffman encoding.
In this edition of Zip-Ada, two known algorithms (one for LZ77, one for finding an appropriate Huffman encoding based on an alphabet's statistics) are combined probably for the first time within the same software.
Additionally, the determination of compressed blocks' boundaries is done by an original algorithm (the Taillaule algorithm) based on similarities between Huffman code sets.

*** 

Zip-Ada is a library for dealing with the Zip compressed archive
file format. It supplies:

 - compression with the following sub-formats ("methods"):
     Store, Reduce, Shrink (LZW) and Deflate
 - decompression for the following sub-formats ("methods"):
     Store, Reduce, Shrink (LZW), Implode, Deflate, BZip2 and LZMA
 - encryption and decryption (portable Zip 2.0 encryption scheme)
 - unconditional portability - within limits of compiler's provided
     integer types and target architecture capacity
 - input (archive to decompress or data to compress) can be any data stream
 - output (archive to build or data to extract) can be any data stream
 - types Zip_info and Zip_Create_info to handle archives quickly and easily
 - cross format compatibility with the most various tools and file formats
     based on the Zip format: 7-zip, Info-Zip's Zip, WinZip, PKZip, Java's
     JARs, OpenDocument files, MS Office 2007+, Nokia themes, and many others
 - task safety: this library can be used ad libitum in parallel processing
 - endian-neutral I/O

Enjoy!

The application is damaged and can't be opened Read more

by Forward in Code

I downloaded a new version of eclipseArduino(Arduino development within Eclipse). After unpacking and moving to /Applications, I get the message

"eclipseArduino" is damaged and can't be opened. You should move it to the Trash.

Huh.

After a lot of poking around, I found this:

  1. Open Gatekeeper settings located in System Preferences > Security & Privacy.
  2. Set Allow applications downloaded from: to Anywhereand confirm by pressing Allow From Anywhere.
  3. Run the application.
  4. Once the application has been successfully launched, it no longer goes through Gatekeeper; so, restore Gatekeeper settings to the default option Mac App Store and identified developers after successfully launching the application.

Audacity 2.1.2 and El Capitan Read more

by Forward in Code

I wanted to update Audacity, but the 2.1.2 version (from the official download site) wouldn’t run: I got can’t open application Audacity.

Read more »

Free tools and libraries updates Read more

by AdaIC

Here are some recent updates to our Free Tools and Libraries page:

August 11, 2016: Added PUGIXML Ada, a set of Ada bindings to PUGIXML, a lightweight XML processing library.
Added ZStd for Ada, Ada bindings to ZStandard, a new lossless compression algorithm.
July 19, 2016: Added Libsodium-ada, a set of thick Ada bindings to libsodium. Libsodium is a portable implementation of the NaCl encryption, hashing, and authentication library.
June 22, 2016: Added Container JSON, utilities for serializing/deserializing Ada containers to/from JSON.
Added SymExpr, a generic package for manipulating simple symbolic expressions.
May 19, 2016: Added AdaBase, a new database interface for Ada.
Added Imago, a binding to DevIL (a universal image handling library).
(This post will be periodically updated – Webmaster.)

Update to the Ada 2012 Rationale available Read more

by AdaIC

The Rationale Update for Ada 2012, based on an article in the Ada User Journal by John Barnes, organizes the changes of Technical Corrigendum 1 for Ada 2012 into the same chapters as the Ada 2012 Rationale. It is an essential companion to the Rationale document. It is available on the Ada Conformity Assessment Authority website in a variety of formats.

GLOBE_3D: now, a bit of fog... Read more

by Gautier de Montmollin

Click to enlarge picture

Here is the code activating the fog in the background.

    if foggy then
      Enable (FOG);
      Fog (FOG_MODE, LINEAR);
      Fog (FOG_COLOR, fog_colour(0)'Unchecked_Access);
      Hint (FOG_HINT, FASTEST);
      Fog (FOG_START, 1.0);
      Fog (FOG_END, 0.4 * fairly_far);
    end if;

As usual with GL, it looks very obvious, but (as usual too) it is one of the few combinations that are actually working.

GLOBE_3D Release 2016-07-05 - "Blender edition" Read more

by Gautier de Montmollin

GLOBE_3D is a GL Object Based 3D engine realized with the Ada programming language.
URL: http://globe3d.sf.net

Latest additions:
  • Use of Generic Image Decoder (GID) in GL.IO; now most image formats are supported for textures and other bitmaps to be used with GLOBE_3D (or any GL app)
  • New Wavefront format (.obj / .mtl) importer
  • Doom 3 / Quake 4 map importer more complete
  • Unified GNAT project file (.gpr), allowing to selected the target Operating System (Windows, Linux, Mac) and compilation mode (fast, debug, small) for demos, tools, etc.
  • Project file for ObjectAda 9.1+ updated
The first two points facilitate the import of 3D models from software such as Blender.
Here is an example:
Click to enlarge
Coincidentally, the Wavefront file format so simple that you can also write 3D models "by hand" in that format. An example made in an Excel sheet is provided along with the importer, in the ./tools/wavefront directory.

Click to enlarge
Enjoy!

GLOBE_3D: non-convex objects with transparency Read more

by Gautier de Montmollin

It's stunning how the inventors of GL addressed from the beginning, in 1991, subtle issues popping up when displaying 3D object in your own program 25 years later.
For instance, take this model:

No alpha test. Click to enlarge.


It is a cross shaped (considered from above) object; texture has lots of transparency.
In the red rectangle you see the issue: the face in front was displayed before the face behind.
There is no bullet-proof rule for sorting faces, and GL has a per-screen-pixel depth buffer that allows displaying faces in an arbitrary order. So we don't want to introduce imperfect face sorting just for dealing with this kind of object.
Fortunately, the GL geniuses have invented a solution for that issue too:
    Enable    (ALPHA_TEST);
    AlphaFunc (GREATER, 0.05);
Et voilà...
Alpha test. Click to enlarge.
The model, "herbe01.obj" is in the ./tools/wavefront directory in the GLOBE_3D repository.

GLOBE_3D is a GL Object Based 3D engine realized with the Ada programming language.

URL: http://globe3d.sf.net/


Make with Ada Read more

by Ada in Denmark

AdaCore announced the embedded Ada programming competition “Make with Ada” earlier this week.  I hope to see many interesting entries (and plan to do something myself as well).


GLOBE_3D: most image formats now available for textures Read more

by Gautier de Montmollin

The texture loader in GL.IO was around 15 years old and supported only the Targa (.tga) format for textures, plus a few sub-formats of Windows bitmaps (.bmp).
In order to make things easy when dealing with various models, e.g. those imported from Blender, the old code for reading images has been wiped out and the loader is using now GID for the job, supporting JPEG or PNG in addition. For instance the Blender model below is using the JPEG format for textures.

Futuristic Combat Jet (hi poly version) by Dennis Haupt (DennisH2010)

The following Blender model has a single PNG texture projected on a complicated surface called a Mandelbulb (never heard of before!) :

Mandelbulb 3D Panorama 3 by DennisH2010


GLOBE_3D is a GL Object Based 3D engine realized with the Ada programming language.

URL: http://globe3d.sf.net/


Wavefront importer for GLOBE_3D Read more

by Gautier de Montmollin

Basically, it is possible now to import a model saved in Blender as a Wavefront (.obj) model, and turn it into a GLOBE_3D object:

Futuristic Combat Jet by Dennis Haupt (DennisH2010)

GLOBE_3D is a GL Object Based 3D engine realized with the Ada programming language.

URL: http://globe3d.sf.net/

Multitexturing with GLOBE_3D - Doom 3 scene Read more

by Gautier de Montmollin

Here, the effect of adding specular maps to the polygons' textures.

Before


After

The effect is better seen in motion. Click here for a video capture.

GLOBE_3D is a GL Object Based 3D engine realized with the Ada programming language.
URL: http://globe3d.sf.net/


Visual debugging with GLOBE_3D Read more

by Gautier de Montmollin

Texture names
Portal labels
More to come soon...

GLOBE_3D is a GL Object Based 3D engine realized with the Ada programming language.
URL: http://globe3d.sf.net/


Multitexturing with GLOBE_3D Read more

by Gautier de Montmollin

First success with multitexturing (*) and the GLOBE_3D engine.
Commit on SourceForge: click here.



More to come soon...

GLOBE_3D is a GL Object Based 3D engine realized with the Ada programming language.
URL: http://globe3d.sf.net/

__
(*) several images are painted on the same polygon, but with different light reflection properties.

A HD video capture from GLOBE_3D Read more

by Gautier de Montmollin

No news on this project - except my hardware for testing it is "new" (2012) and opens possibilities for nicer and smoother video captures.
Enjoy!


On my to-do list (since many years): add multi-texture rendering (specular reflection in addition to diffuse reflection), to make GLOBE_3D look a little bit more like a 21th-century 3D engine...

Using the Ada Wiki Engine Read more

by Java 2 Ada

The Ada Wiki Engine is a small Ada library that parses a Wiki text in several Wiki syntax such as MediaWiki, Creole, Markdown and renders the result either in HTML, text or into another Wiki format. The Ada Wiki Engine is used in two steps:

  1. The Wiki text is parsed according to its syntax to produce a Wiki Document instance.
  2. The Wiki document is then rendered by a renderer to produce the final HTML or text.

The Ada Wiki Engine does not manage any storage for the wiki content so that it only focuses on the parsing and rendering aspects.

Overview

The Ada Wiki engine is organized in several packages:

  • Several Wiki stream packages define the interface, types and operations for the Wiki engine to read the Wiki or HTML content and for the Wiki renderer to generate the HTML or text outputs.
  • The Wiki parser is responsible for parsing HTML or Wiki content according to a selected Wiki syntax. It builds the final Wiki document through filters and plugins.

ada-wiki.png

  • The Wiki filters provides a simple filter framework that allows to plug specific filters when a Wiki document is parsed and processed. Filters are used for the table of content generation, for the HTML filtering, to collect words or links and so on.
  • The Wiki plugins defines the plugin interface that is used by the Wiki engine to provide pluggable extensions in the Wiki. Plugins are used for the Wiki template support, to hide some Wiki text content when it is rendered or to interact with other systems.
  • The Wiki documents and attributes are used for the representation of the Wiki document after the Wiki content is parsed.
  • The Wiki renderers are the last packages which are used for the rendering of the Wiki document to produce the final HTML or text.

Building Ada Wiki Engine

Download the ada-wiki-1.0.1.tar.gz or get the sources from GitHub:

git clone git@github.com:stcarrez/ada-wiki.git ada-wiki

If you are using Ada Utility Library then you can configure with:

./configure

Otherwise, you should configure with:

./configure --with-ada-util=no

Then, build the library:

make

Once complete, you can install it:

make install

To use the library in your Ada project, add the following line in your GNAT project file:

with "wiki";

Rendering example

The rendering example described in this article generates an HTML or text content from a Wiki source file. The example reads the file in one of the supported Wiki syntax and produces the HTML or text. You will find the source file on GitHub in render.adb. The example has the following usage:

Render a wiki text file into HTML (default) or text
Usage: render [-t] [-m] [-M] [-d] [-c] [-s style] {wiki-file}
  -t        Render to text only
  -m        Render a Markdown wiki content
  -M        Render a Mediawiki wiki content
  -d        Render a Dotclear wiki content
  -g        Render a Google wiki content
  -c        Render a Creole wiki content
  -s style  Use the CSS style file

Parsing a Wiki Text

To render a Wiki text you will first need to parse the Wiki text and produce a Wiki document instance. For this you will need to declare the Wiki document instance and the Wiki parser instance:

with Wiki.Documents;
with Wiki.Parsers;
...
   Doc      : Wiki.Documents.Document;
   Engine   : Wiki.Parsers.Parser;

The Ada Wiki Engine has a filter mechanism that is used while parsing the input and before building the target wiki document instance. Filters are chained together and a filter can do some work on the content it sees such as blocking some content (filtering), collecting some data and doing some transformation on the content. When you want to use a filter, you have to declare an instance of the corresponding filter type.

with Wiki.Filters.Html;
with Wiki.Filters.Autolink;
with Wiki.Filters.TOC;
...
   Filter   : aliased Wiki.Filters.Html.Html_Filter_Type;
   Autolink : aliased Wiki.Filters.Autolink.Autolink_Filter;
   TOC      : aliased Wiki.Filters.TOC.TOC_Filter;

We use the Autolink filter that detects links in the text and transforms them into real links. The TOC filter is used to collect header sections in the Wiki text and builds a table of content. The Html filter is used to filter HTML content that could be contained in a Wiki text. By default it ignores several HTML tags such as html, head, body, title, meta (these tags are silently discarded). Furthermore it has the ability to hide several elements such as style and script (the tag and its content is discarded).

You will then configure the Wiki engine to build the filter chain and then define the Wiki syntax that the parser must use:

Engine.Add_Filter (TOC'Unchecked_Access);
Engine.Add_Filter (Autolink'Unchecked_Access);
Engine.Add_Filter (Filter'Unchecked_Access);
Engine.Set_Syntax (Syntax);

The Wiki engine gets its input from an Input_Stream interface that only defines a Read procedure. The Ada Wiki Engine provides several implementations of that interface, one of them is based on the Ada Text_IO package. This is what we are going to use:

with Wiki.Streams.Text_IO;
...
   Input    : aliased Wiki.Streams.Text_IO.File_Input_Stream;

You will then open the input file. If the file contains UTF-8 characters, you may open it as follows:

Input.Open (File_Path, "WCEM=8");

where File_Path is a string that represents the file's path.

Once the Wiki engine is setup and the input file opened, you can parse the Wiki text and build the Wiki document:

Engine.Parse (Input'Unchecked_Access, Doc);

Rendering a Wiki Document

After parsing a Wiki text you get a Wiki.Documents.Document instance that you can use as many times as you want. To render the Wiki document, you will first choose a renderer according to the target format that you need. The Ada Wiki Engine provides three renderers:

  • A Text renderer that produces text outputs,
  • A HTML renderer that generates an HTML presentation for the document,
  • A Wiki renderer that generates various Wiki syntaxes.

The renderer needs an output stream instance. We are using the Text_IO implementation:

with Wiki.Stream.Html.Text_IO;
with Wiki.Render.Html;
...
   Output   : aliased Wiki.Streams.Html.Text_IO.Html_File_Output_Stream;
   Renderer : aliased Wiki.Render.Html.Html_Renderer;

You will then configure the renderer to tell it the output stream to use. You may enable or not the rendering of Table Of Content and you just use the Render procedure to render the document.

Renderer.Set_Output_Stream (Output'Unchecked_Access);
Renderer.Set_Render_TOC (True);
Renderer.Render (Doc);

By default the output stream is configured to write on the standard output. This means that when Render is called, the output will be written to the standard output. You can choose another output stream or open the output stream to a file according to your needs.

Conclusion

The Ada Wiki Engine can be used to parse HTML content, sanitize the result through the HTML filter and convert it to text or to some Wiki syntax (have a look at the import.adb example). The engine can be extended through filters or plugins thus providing some flexible architecture. The library does not impose any storage mechanism. The Ada Wiki Engine is the core engine used by AWA Blogs and AWA Wiki web applications. You may have a look at some online Wiki in the Atlas Wiki demonstrator.