Drew Scott Daniels' Blog Personal, usually technical posts

August 20, 2005

20050820

Filed under: Uncategorized — admin @ 8:00 pm
One of the things I've been thinking about writing about is interesting
and common math problems. I'll start with some easy ones that I've had
to do at work (no trade secrets, just common math in the trade).

Notes:
- I'll use the prefix "0x" to denote that the number is written in hex
notation. E.g.'s 0x10 is 16, 0x1F is 31.
- There's 8 bits in a byte (I may not get to that until later problems
though).

Largest packet aligned buffer:
MPEG 2 transport streams have both 188 and 204 byte packets. In order to
transfer a high speed transport stream, large buffers are needed. Large
buffers reduce the number of interrupts per second, and make for
effective DMA transfers. To effectively process an MPEG-II transport
stream, buffers should be a multiple of the packet size. Splitting and
joining buffers is not only "a pain" to program, but also requires
otherwise unnecessary processing power.

With a maximum buffer size of 0x20000 (131072) bytes, and a chosen
packet size of 188, what is the largest packet aligned buffer size
allowed?

The answer is simply 131072/188*188 when the division is integer
division (i.e. no decimal places). To do this with a calculator that
only does regular division, one simply needs to remember the integer
part of the division. In this case 131072/188=697..... So, then I just
clear the calculator, and type 697*188 and get 131036 (0x1FFDC).

...

With a maximum buffer size of 0x20000 (131072) bytes, and a chosen
packet size of 204, what is the largest packet aligned buffer size
allowed?

The answer is 131072/204*204 where the division is integer division. The
final number is then 130968 (0x1FF98).

What is the minimum buffer size that is divisible by both 188 and 204?

To solve this, take out the common factors, and multiply the remaining
together. The factors in 188 are: 2, 2, 47. The factors in 204 are: 2,
2, 3, 17. So the answer is 2*2*47*3*17=9588 (0x2574), or
188*204/2/2=9588 (0x2574).

With a maximum buffer size of 0x20000 (131072) bytes, and a packet size
that can only be 188 or 204, what is the largest packet aligned buffer
allowed?

Using the knowledge from above that 9588 bytes is the smallest multiple
of both 188 and 204, it's simply 131072/9588*9588 (again where the
division is integer division). So the answer is 124644 (0x1E6E4).

Other numbers of interest:
- 196=188+8(64 bits is 8 bytes) for a 188 byte packet with a 64 bit
timestamp.
- 212=204+8 for a 204 byte packet with a 64 bit timestamp.
- 512 bytes is a common write size divisor for hard drives.
- 192=188+4(32 bits is 4 bytes) for the MPEG stride (MPEG 2 transport
stride?) format (HDV).
- 4 bytes (32 bits) seems to be a preferred DMA number (the PCI bus is
always at least 32 bits wide).
- 8 bytes (64 bits) might be preferred for certain DMA.

I'll probably write myself up a quick reference. When programming, you
can have the program calculate the right numbers given any buffer size,
packet size or other factor.
Originally from: http://www.boxheap.net/ddaniels/notes/20050820.txt

August 17, 2005

20050817

Filed under: Uncategorized — admin @ 8:00 pm
So today I got a new USB key. It's a Kingston Data Traveller. I decided 
that I should set it up so that I can use it with the laptop I use on a 
regular basis (a very old Toshiba Satellite 310CDS that's rattling). My 
first step was to consider the file system format. I was surprised to 
find that I couldn't format the device in NTFS format. I left it 
formated at the default FAT32 and decided to get on to other things.

At home tonight I spent some time downloading the drivers for the device 
and attempted to install them. The driver installer is an installshield 
created one that's been winziped into a self extracting file (sometimes 
called an sfx). Three layers of compression and installer junk managed 
to make the under 49KiB of driver files take over 1MB. The second 
frustrating thing I ran into is the installer is designed to detect the 
operating system, and refuse to install if it doesn't think it'll work. 
Well, I guess before that I had read a faq from Kingston saying that 
Windows 95 doesn't support USB drives at all.

Before I get ahead of myself again, I'll go back and say that I tried to 
do some research on what Windows 95 supports in the way of USB drives. I 
didn't manage to find much, but I did find the usual indications that 
earlier versions of Win95 didn't have any USB support, or that it wasn't 
working. I already knew that USB support worked on my computer.

I guess I should have seen things coming ahead of time. In December I 
had looked at trying to get pictures off a Fijitsu FinePix digital 
camera. It too had an annoying installer, and claimed not to work with 
Win 95. It further had a bunch of software bundled with it's 
installation that I still haven't bothered to figure out. Luckily, the 
installer for the driver itself wasn't hard to find, and I managed to 
get the device driver, and some of the software installed.

Despite getting things installed for the FinPix camera, the software 
complained about a missing dll function, and the driver didn't seem to be 
working. I decided that with my many licences of Windows, I should try 
to upgrade certain dll's with versions from newer versions of Microsoft 
Windows. My results were of course that some of the important dll's 
could not be replaced.

That got me thinking again about getting open source replacements for 
certain components. I looked for a while, and decided that without a 
better understanding, I might end up accidentally installing a dll that 
needs an Linux shared library (.so) or something. My following of the 
Wine Weekly News (WWN) on http://www.winehq.com and reading the ReactOS 
developers/kernel mailing list indicated that some dll's from these 
projects were defiantly dependent on components that I'm not ready to 
replace.

So more recently (getting back to the USB key), I did another search on 
the subject of replacing Microsoft Windows 95 dll's with OpenSource 
compatible versions. I'm also now considering replacing the kernel and 
other core files. I did find that WWN shows that they've been building 
PE versions of their dll's for Win32, but it's not clear which can 
replace the dll's in Windows 95. I get the impression that files from 
ReactOS might be a better replacement than Wine's as they'll have less 
Linux, BSD, Solaris related stuff in them and be created with binary 
compatibility in mind for even more core pieces (e.g. no required 
wineserver).

To date I've had no luck with either the USB key or digital camera under 
Windows 95. I've decided that in order to start replacing Win95 on this 
notebook, I'd better get a better understanding of the dependencies and 
compatibilities of different components. To do this I'd like to get or 
create a list of files, a graph (tree?) of the dependencies between 
files, and a fresh compatibility status of the files from whatever 
source I choose. Unfortunately ReactOS's compatibility page doesn't jump 
out at me in searches (I remember seeing it once or twice). I also 
believe both ReactOS and Wine don't list their compatibility in relation 
to Windows 95, but to whatever the latest version of the component is.

So the processes I'll probably want to take will start with listing the 
operating system files on the computer I'm targeting. Then I'll probably 
use something like dependency walker (depends.exe from systeminternals?) 
to figure out the dependencies of each files (as best I can). Then I'll 
look at the compatibility status on the web. Last I may have to look at 
the exports from both files. Since no one else seems to have published 
this information, I'll probably write up my findings as I go. I might 
even make it easier to install Open Source replacement components for 
other versions of Windows by performing the same process using fresh 
installs of other versions.

It's getting late now and I'm getting tired. I was planning to also 
write about how to use unshield and winzip to extract files from 
annoying installers. I also felt the need several times to explain why I 
wanted open source replacement files, and didn't upgrade Windows 
(remember I do have licences to newer versions). I guess I can quickly 
say that I like having free access to the source of what I'm using so 
that I or just about any other programmer can enhance/fix it. I also 
don't want to install Windows98 or later on this laptop because it may 
take more system resources, not run, and well I'd rather maximize the 
use of my Windows 95 licences before using other ones. I've tried 
ReactOS and Wine, and I know they're still not 100% replacements for 
Windows (although extremely close nowadays). I also believe that other 
people share my viewpoints and/or situations.

Maybe later this week I'll write more on the topic of replacing windows 
components or Windows Device Drivers (wdm, ndis, inf, the wonderful 
dpinst.exe and more), but for now it's time for me to get some sleep...

Originally from: http://www.boxheap.net/ddaniels/notes/20050817.txt

August 11, 2005

20050811

Filed under: Uncategorized — admin @ 8:00 pm
I was planning on writing about the problems I faced at work looking up 
open source software for SMPTE 125M convertion. I kept finding SMPTE 
timecode stuff (for MIDI), and other usages of the acronym SMPTE without 
reference to which standard was being used. The one's related to SMPTE 
125M are SMPTE 292M (HD-SDI), SMPTE 259M (transport of SDI and SDTI), 
SMPTE 305M (sometimes called SMPTE 305.2M which is SDTI), and the 
document on ancellary data. Actually SDTI really is quite different from 
SDI except that it goes over 259M.

Anways, tonight I think I write a bit about linking and google. Yes, 
part of the reason that I'm writing these notes is to increase the 
ranking that I'll get for topics that I'd like employers to see. The 
bigger way that I plan to get a good ranking is something I've 
accidentaly found before. I've put a one line signature in my e-mails to 
mailing lists with my resume's URL. I was hoping I could find someone on 
the mailing list that might be interested, or might refer me to someone, 
but instead I found that the html mailing list archives looked to be 
increasing the rank of my resume. I guess this was a neat trick that can 
work on google, and maybe on other search engine's that look at what's 
linking to a page to give it a score.

When I finnaly am happy with the testing scripts that I'm working on for 
my tarball enhancements I'll post the results to various mailing lists 
that are development forums for projects with large tarballs (e.g. the 
lkml, some kind of gimp mailing list, maybe some OpenOffice.org AKA OOo 
mailing lists...). I've got my resume's URL in the scripts themselves, 
but I also plan to put my resume URL tagline in my messages.

One of my problems with my tarball enhancement postings is that I'll 
want a perminate place with my domain name that I can host the scripts, 
but I'm getting free hosting from a friend (thanks Dean). I don't want 
to generate a lot of hits on my friend's server due to the fact he 
likely has better uses for his bandwith, and his ISP may not apreciate 
it. To prevent such a load on the link to his server (and his server), I 
plan to keep the scripts only on the mailing lists (archived in their 
archives) until interest drops down a bit. I figure a few weeks would 
do, but I'll probably wait a few months.

I'm really quite kean to get my scripts out the door, but I feel they're 
not yet ready to stand up to the kind of critism that one gets on the 
Linux Kernel Mailing List (lkml). I've got a script to do the actual 
tarball creation, and one to show the difference between a normaly 
generated one, and the one my script makes, but I don't have something 
showing the amount of time that it takes. Measuring the sorting isn't 
easy, as it's a series of piped commands. My shell scripting really 
isn't put to enough use for me to be able to quickly work around such a 
problem. I've checked a few howto's like the bash one, I've asked in the 
bash scripting IRC channel, but I couldn't find an answer. I decided to 
put the commands into a separate script and time that whole script.

The other problem I've run into is testing. My home computer was taking 
a beating compressing and untaring etc.. I decided to use my 
SourceForge compile farm shell to do the testing, but it's a pain to put 
files onto them. It took me a while before I figured out I had to 
download the files to my computer, and then upload them to the compile 
farm's central server via sftp or scp. That's something I can do, but it 
really compounds another problem I'm having. It takes me a while to make 
progress on my free time coding projects, so new target files are 
comming out for me to test. I want to be able to post on the lkml the 
results of recompressing the latest 2.6 and 2.4 kernels. I keep 
optimistically downloading the latest kernels and then having real life 
interupt things long enough for me to need a new version to continue. 
I'll stop doing that for a while though until I've actually got a draft 
sitting in my posponed box of an e-mail to the lkml with the scripts 
already finnished and attached or actually inline I think. That's 
another problem. The lkml only accepts certain posts, and Linus only 
usually accepts things that are in a certain format (plain text inline 
iirc). That put me on a tangent of looking up the mailing list rules, 
and reading the Linux Weekly News. It'll likely do the same once I get 
close enough again.

So with all my knowledge, reading, and interest in digging deep into 
open source stories that I see writen/posted, I've thought about trying 
to get payed to write. These notes are a bad example of my ability to 
write, but a good example of what I enjoy writing about. I've been 
solicited once to write a book on Intrusion Detection from a genuine 
publisher, but I kind of "fubbed" my responce. I said that I'd be 
interested in contributing, but I didn't think I'd have time to write a 
whole book. I kind of regret doing that, but I think it was the right 
thing to say (just look at my bad record finding time to do coding). I'm 
hoping however that a paying gig would actually let me take some time 
away from real life to actually get things done (and I'm sure it would). 
Of course I've got to stike a balance to keep my home life happy and 
healthy (fammily, friends, and my own condition). I've offered to write 
a peice on the history of the BSD's to the Linux Weekly News, but they 
didn't seem interested. They do post BSD articles, and I was pitching 
that I could write one that would show the parallels between AT&T vs The 
Regents of Berkly (BSD) and the current SCO vs IBM etc.. It's 
interesting how the history repeats itself. For good reference I'd 
suggest reading the FreeBSD mailing list archives (a google search found 
some good stuff).

Later I might publish the research that I used as part of my pitch for 
my BSD history repeats itself story. I'm also probably going to consider 
writng about why I don't want to publish my unrealized ideas. I'll also 
probably talk about:
- Why I don't write about office politics
- Why I don't write much about my personal private home life (well, 
maybe I made that clear <g>)
- My music idea's
- My thoughts and research into a self powered home (well actually 
getting power form alternate sorces like sun, wind, water...)
- Thoughts on using "image stacking" for ameture (and hopefully 
professional) astronomy (I'll talk about this because other people have 
already implemented some of this)
- Some idea's for how people can generate data that's easier to compress 
(e.g.'s typing in lower case when there's the option, removing obvious 
redundant information, using the same words...)
- Perhaps my ideas on natural language processing
...

I may eventually post my project ideas from the last fourteen years that 
I've been writing on paper.

Consider sending me money! My resume is at 
http://www.boxheap.net/ddaniels/resume.html

Oh, and I'll probably write about resume creation and open source tools 
to do it (hey, maybe lwn.net would be interested in buying that 
article).

Originally from: http://www.boxheap.net/ddaniels/notes/20050811.txt

August 10, 2005

20050810

Filed under: Uncategorized — admin @ 8:00 pm
There doesn't appear to be any adopted standards for MPEG over IP. IP
over MPEG looks more interesting. Just packetize an IP stream into a
packetized elementary stream (PES) and multiplex it into a valid MPEG2
Transport Stream. MPEG-2 typically gets transfered over DVB-ASI, DVB-C,
DVB-SI, DVB-T and other protocols (even "ATSC" AKA SMPTE 310M).

So how do you packetize IP packets to go into an MPEG stream? Well that
depends on the source. I'd like to think that any IP source "worth it's
salt", is from a live network. Thus a network feed would need to input
into the packetizer, multiplex it and put it out over a different type
of device. I've heard of some people making a network device driver for
DVB-ASI cards, but at least one engineer I talked to said there's
probably a better way. He suggested keeping the regular characteristics
of the ASI device, and doing the packetizing in application space. I
managed to convince him however that the conveniences of creating a
network device which can be bridged would be far better. He stuck with
the separate device driver idea however and suggested one driver could
use the other.

So then the question is, how do you create a network device driver
that's just a packetizer, multiplexer and forwarder? No doubt there's
some good examples out there, and NDIS should make it easier. I still
worry about doing more than elementary processing in a driver might
cause some strange system behavior. I guess I should also say there's
probably an even easier way to do things in Linux and FreeBSD variants,
but I'm mostly focused on the Microsoft world as that's what I'm told by
Marketing is what's wanted.

On the opposite end you need a depacketizer? Or something to demultiplex
the stream, and put IP back out onto the network. I've seen this done in
software, and that might make more sense on this side of transfers. The
engineer that I speak of above however suggested that the unidirectional
nature of MPEG II transport streams would give another problem,
associating one direction of traffic with the other.

I'm not quite sure how other people bind one transfer direction with
another, but I remember several satellite companies offering service
that beamed high speed broadband internet access to customers and
accepted data back to them via telephone modem. So schemes to put two
different directions from seperate devices have been around for a while.
I just hope that modern network stacks are smart enough to remember that
it's allowed.

I remember someone telling me that the ARPA network was an experiment
designed with the goal that it be able to stay up, even if one link in
the network went down. It failed, or at least that's the punchline. The
modern Internet can't reroute if there's a failure in a router. There
was a fire in a telecom building in Toronto, and connections from
Manitoba Telephone Services (MTS) to Shaw in Winnipeg went down. I've
also seen where an outage in Shaw's network caused places to become
inaccessable, but if you had a proxy on a CA network accessable address,
you could access the rest of the internet. Those are just two local
examples that I know about. The CA network thing is political (I'm told
they're not allowed to carry commercial data due to their funding
grants). For what it's worth, I've also seen many shares of
misconfigured routers, the more obvious cases were with major telecom.
companies.

So back to MPEG transport streams. I know companies like Norsat have
been selling "solutions" to do these things for years, so I think
there's a market. Identifying the market potential is difficult for this
because it's not something most broadcasters, stations, and local
distributors are looking for. It's also not something that's really even
remotely accessable to consumers.

A similar issue that I've thought about for even more years is multiple
links between computers to increase throughput. I know lots of other
people have looked at bridging and bonding, but I wanted to look at it
at an even more insane level, serial ports. Actually I wanted to look at
paralell ports, modems, ethernet, etc.. I suppose it is possible to bond
all these links together, but it certainly isn't common enough that it's
as easy as listing the links (at least as far as I know).

So why bother with all this legacy stuff? Why not build a new network
card that can communicate at the full buss speed? Well actually we're
pretty close to that now. From my own experiences I've calculated that
modern HD-SDI cards must be close to maxing out the bus throughput.
I've also learned that multiple cards on the same bus can't allow a
faster network connection as of course there's only the one bus. Of
course I've seen computers with multiple bus's, but it's hard to know if
they're truly independent, or if they're more likely bridged. Even a
bridged network of bus's can allow each bus to operate almost
independently, if the parent bus is faster than it's children combined
there may be an advantage to using multiple cards.

Even further it's important to note that most modern bus's have
bottlenecks. When was the last time you looked up the DMA latency of the
motherboard you wanted to buy? I'll wager never. I've thought about how
useful this could be to consumers and whether there was a way I could
get the comany that I'm working for to publish regular results of DMA
throughput and latency. We could then get free motherboards. The idea
likely wouldn't work though as that's not really what the business
does.

Other crazy idea's I've had include using every processor on the system
to do computations including the IDE/ATA hard drives (they have RAM
too!). Alas of course most of it would be very convoluted to figure out
a way to use.

The more recent idea that I've had (since Mark Nelson's "random" binary
file challange was posted), was to figure out a list of common
instructions and library calls for which I could get more output that
would be required to issue the request for data (e.g. more bits from
register results than from instruction cost...). This idea has some
potential, but my current "hurdle" is finding the time to get and go
through a list of CPU instructions. Getting a list of available function
calls is also a challange although maybe a nice program to do it already
exists (just get the exports from all dll's etc?).

So as you might be beginning to see, one of my primary interests is
data compression. I used to be interested in the pure pattern finding,
and making the smallest representation possible of common data, but then
I started working in multimedia. It became obvious very fast that the
speed of compression actually is important (not just to those who can't
wait). If things don't compress fast enough you can get overruns, data
loss and ultimatly data corruption (even if that just means missing
bits/bytes/frames...).

One of my past pet projects was zlib compression of SDI (see StreamBed's
deflate option). I've found that I can gzip at 270,000,000 bps (that's
270Mbps in SI notation). The small b means bits of course. The problem
is that it ether needs a fast processor, or a simple pattern (like
colour bars). It may even need both. I haven't had the time to check.
Unfortunaly without the inflate option in StreamBed, customers aren't
too interested yet, and without customer interest my boss isn't too
interested yet either.

I later plan to talk about:
  • compression of MPEG transport streams (they're already MPEG compressed, but the tables are text, and there's that predictable 0x47 once per packet or 188/204 bytes).
  • lossy MPEG table compression (recreate it to meet spec)
  • Alternate SDI compression
  • Compression in firmware
  • My literly magic tarball scritps to improve general tar.gz and tar.bz2 compression with an order(1)+small sort preprocessor
  • My extensive prediction by partial match (PPM) research and documentation (I've spent at least the last 5 years working on a new algorithm that I have high hopes for, but strange results)
  • My ideas (hopefully implemented soon) for a very simple honeypot like intrusion detection system
  • problems with current network security tools
  • Perhaps products and development directly related to my jobs
Drew Scott Daniels' Resume Originally from: http://www.boxheap.net/ddaniels/notes/20050810.txt

Powered by WordPress