Category Archives for Technology
Postscript was invented by Adobe, and released way back in 1985. As a printed page description language, most of Postscript’s income was derived from license fees from OEMs who had to embed Postscript in their printers. For years the exceptional quality of Postscript was only available in these subsequently expensive printers, keeping them out of the hands of home users. In many ways, Adobe’s Postscript licensing fees delayed the onset of desktop publishing by almost a decade years. This left home users with various proprietary character coded control sequences depending on which printer they owned. The Epson C-Itoh codes and subsequently in 1980 HP’s PCL provided at least some commonality, but what we really wanted was Postscript.
Fast forward to 1993, with Postscript licensing fees yet to drop to home market levels, Adobe announces “Portable Document Format” (PDF), or what’s more likely “Postscript Document Format”. PDF is an extended or superset version of Postscript, which includes additional support for screen display and contextual cues for text manipulation. It also provides compression, a necessity considering Postscript’s verbose textual syntax and with typical Internet connections at that time being fairly low bandwidth. Newer versions of PDF have seen a move more toward screen based hypermedia, with a facility called Tagged PDF, a method for embedding HTML like structure in a PDF, and having the PDF reader make all the rendering decisions. Somwhat ironically the opposite direction to that of web browsers, who have been trying to get away from generic rendering virtually since they first appeared.
PDF uses two encoding formats for text, the built in PDFDocEncoding, and Unicode encoded as UTF-16BE. UTF-16BE is the easiest, assuming you have an incoming Unicode encoding, because it is mostly a direct serial encoding of the 16 bit Unicode values. But if you haven’t got a UTF-16BE stream available, then PDFDocEncoding is what you’ll be using.
With PDFDocEncoding, you need to understand what character codes you’re going to be using, so you can map them to the appropriate Postscript and font glyphs.
For example, in Code Page 1252 encoding (CP1252), the value hex 97 is an em dash. CP1252 is an informal Microsoft Windows extended version of the standard ISO-8859-x character encoding, the latter not including en or em dashes and several other significant typesetting glyphs. The Unicode representation of an em dash is U+2014, but Unicode is just a numbering system, and to represent that in memory, you need to pick an encoding. In UTF-16BE (Big Endian), it maps directly to the two byte sequence hex 20 14, but in UTF-8, it gets coded as the three byte sequence hex e2 80 94.
PDF doesn’t provide a way to convert UTF-8 codes to glyphs (e.g. e2 80 94 = emdash), so if not using UTF-16BE, you need to convert any UTF-8 encoding to another single byte encoding, and provide an appropriate glyph mapping table inside the PDF.
| Encoding | Code |
|---|---|
| CP1252 | hex 97 |
| ISO-8859-1 | n/a |
| Unicode | U+2014 |
| UTF-8 | hex e2 80 94 |
| UTF-16BE | hex 20 14 |
| UTF-16LE | hex 24 20 |
In a *nux environment, the iconv (in typically UNIX fashion, shortened from uniconvert) program will convert between a whole range of encodings, but writing your own is also simple to do for single byte encodings, such as code pages and other ASCII extended formats. I’m not going to talk about the specifics of character encoding and Unicode, because there’s enough information out there already, suffice to say that code pages in various forms have been around since the early 1970s, and were used in most micros and mainframes to represent glyphs on a screen. More recently, Microsoft seems to have owned the idea due to the MS-DOS legacy.
At Synop, our Sytadel platform uses HTMLDOC to generate PDFs from microcontent encoded as UTF-8, but HTMLDOC doesn’t yet support UTF-16BE, so we pre-convert our incoming UTF-8 data to CP1252, before passing it to HTMLDOC to remap inside the PDF. The roundtrip is interesting, because the data was originally stored in a Microsoft Access database, and was converted from CP1252 to UTF-8 when it was exported for use in Sytadel.
Conversion of character encodings tends to be lossy, depending on the scope of each encoding, but considering the scope of Unicode, and our originating data being encoded as the fairly limited CP1252, this isn’t a problem we’ve really had to deal with.
Character set encodings can be a tricky beast if you don’t know how they work, as most tend to also embed a common 7 bit ASCII encoding, which can hide some of the more obscure character encoding problems. But if you know the background, character set encodings are in fact like most technologies, quite simple to understand and master.
This is a test vog (sic), which I’m intentionally leaving as an enclosure only. I’ve spoken before about enclosures in the post Podcasting / feedcasting. Still not getting it…, but this time I’m upping the anti from audio to video. Doing my bit to push QuickTime and MPEG4 as a video standard, Victor, this one’s for you.
Last Saturday L* and I went to WaveAid, a benefit concert for victims of the 2004 Indian Ocean Earthquake (commonly referred to as “the tsunami”). 11 (predominantly) Australian artists donating their time to help raise money, including the return of silverchair and Midnight Oil.
I was mainly there to see the Oils, having not seen them on their final tour before they broke up a few years back. But before they came on later that night, I was amazed to hear the Finn Brothers do acoustic versions of a bunch of Split Enz (finishing with I got you) and Crowded House songs, and have most of the SCG singing along. That’s roughly 47000 people. I didn’t realise Crowded House were so popular, and that so many of their songs had become part of Australian rock history. Anyway…
Now when I say 47000 people, it is actually impossible to hear that many people sing at the same time of course, but not for the reasons you’re probably thinking.
Sound travels at roughly 340 metres per second on such a hot day although it is boosted by body heat. The SCG is at least 200 metres from the main stage to the top of the far stands, meaning it takes about 600 milliseconds (over half a second) for the sound to travel the full distance. Because light travels almost instantaneously at 200 metres, a person at the back won’t hear what they see until over half a second after the fact. Half a second doesn’t sound like much, but our senses think it is, especially when trying to sing along to visual guitar strumming.
To get the main stage sound to the very back, you’re either going to permanently deafen the people down the front, or you’ll have to place extra speakers at the back. Of course the latter is the case, and at the SCG you’ll find about half a dozen speakers around the ground, staggered at various distances from the stage. If you pumped electrical signals into those, you’ll find the sound comes out in almost perfect time with what people are seeing, at all distances, which is what we want to happen. However the problem is the people at the back will now here the immediate sound, plus six other signals slightly delayed by about 100 milliseconds apiece, as each speaker’s audio arrives from different distances, causing an almost deafening echo or reverb. You’ve probably experienced a smaller version of this at a school sports carnival, where they use multiple speakers around an oval.
The solution is to use digital delays to delay the sound coming out of each speaker, by the time it takes sound to get from the main speakers to the target speaker. So if you have a speaker 100 metres back, the PA will be configured to delay the signal being sent to it by about 300 milliseconds. Likewise the back speakers, 200 metres out, will be delayed by 600 milliseconds. This way the sound from every speaker, will arrive at the same location all at the same time that the sound from the actual performance arrives.
Of course sound is omnidirectional, so sound out of the back speakers will also travel forward, meaning the distance between each speaker needs to be carefully calculated.
So if you’re at the back, what you see on stage is still going to be out of sync with the audio, but at least the sound will be clear and have no noisy echo. Unless of course you have a PA which is able to send audio 600 milliseconds forward in time.
What this means is that when the audience sings along, not only will the front and back of the audience actually be out of sync, due to the delayed audio, but they’ll be gradually staggered from front to back, like a mexican wave. Due to this staggering, we find that only small sections of audience are ever in sync, and thus are not actually vocally loud enough to make an audible impact on other sections of the audience. So, 47000 people at once? More like 1000 people at once. Although if the people down the front had super hearing, they’d be able to hear the singing at the back, and wonder why it was delayed by 1.2 seconds.
Getting back to WaveAid, when the Oils finally came on, we were about 10 metres from the stage, right in front. They kicked off with Read about it, not one of their better known songs, but the entire audience (mileage may vary) for as far as we could hear, were singing along. A couple standing next to us who weren’t Oils fans, even had huge grins on their faces, marvelling to the chorus of 47000 voices (slightly staggered) who knew most of the lyrics. Peter’s opening words “I’m probably the only member of parliament currently singing in a band [..] some of us may have moved on, but what we stand for still hasn’t changed” then set the tone for the rest of their set.
The highlight of the night was when drummer Rob Hirst started a simple snare beat, which the audience immediately recognised as Dead heart, and started humming Doo do, doo do, doo do do for about a minute before any other instruments kicked in. The couple next to us kept looking around, unable to stop grinning. To then hear 47000 people singing about giving Australia back to the Aborigines, was probably the most significant and positive political statement I’ve heard made in this country for many years. And the music was great as well. 🙂
A bunch of lefties? I doubt it. Aussies enveloped in the emotional rhetoric of Australia’s greatest ever live band? Perhaps. 47000 people who care about human life, tolerance and a world population living together in peace? Most likely. John Howard take note. You have three years.
Update: I’m pretty sure the songs were Read About It, Say Your Prayers, Best of Both Worlds, King of the Mountain, Forgotten Years, Power and the Passion, and Dead Heart. I can’t remember the exact order.