Finding, installing, and integrating
an Open Source application
into a Legacy Software System
From here to there, and back again – a trip into Open Source Land
My background, as my brief bio on the GHRUG website notes, runs from days of old, as MPE goes. My first access to a 3000 was in in 1976 at a secondary school in Maryland. I first worked on 3000's shortly thereafter - in the days of the venerable Series-II. First as a 3rd party vendor's programmer/analyst, then in sales support, and finally as a senior technician and application architect. I eventually joined the ranks of customers that used MPE based systems, then found myself progressing upwards into management, and finally I've come back full circle back to being a vendor – but I've never presented at a conference before. So please, bear with me.
Prior to this adventure, I'd had no practical open source work experience, not even with the applications provided in recent years by HP. Most of my recent contracts had been with customers that seemed content to use their homegrown and/or vendor-supplied packages without much thought of trying to modernize them by web or email enabling their applications, let alone anything more involved.
And now a couple of questions for the audience:
How many have any exposure w/HP packaged Open Source software (bind, sendmail, syslog, apache, perl, samba, ldap), either as part of your OS distribution – or via Jazz downloads?
How about with non-pre-packaged open source for MPE (private ports like Mark Bixby or Lars Appel's – or ones you've done yourself)?
Before we get too far into this – perhaps we should try to make sure we all have the same basic understanding of what 'open source' is, and how it differs from 'free software'. One of the things I've discovered, while learning about this thing called 'open source', is that there's as much division in the ranks of leaders in the so called 'free software' or 'open source revolution' – as there likely is in ANY revolution. The broadest difference seems to be in the scope of the intent, definitions used, and how (if at all) a profit could be made from these things. Because I'm not a lawyer, nor do I play one on TV, I'd rather focus on the practical differences – with regard to software that is portable enough that it can be converted (or – to use the proper term 'ported') to run on the 3000.
For many years, there has been quite a collection of 'freeware' for the 3000, almost exclusively made available in binary executable form only, and nearly always the copyright is retained by the original owners. That is – a limited right to use may be granted, but not right of ownership. This is not the type of software we'll be talking about today.
To those who cut their proverbial teeth developing proprietary software, open source is, at least at first, a strange animal. Generally, open source software is broken down into a unit called a 'package'. A package may be as simple as a single piece source code for a library routine or stand-alone executable - and it's supporting build tools & documentation, to something as complex as an entire ERP system, with thousands of components – and that likely has in addition, numerous external dependencies to other packages! The original author of the software package may (but doesn't always) maintain ownership, but grants open use to the source code for modifications, and even derivative works. Hence the name such code is commonly known by: open source.
If you're interested in more information on open source and/or free software – I'd suggest checking out the Free Software Foundation website www.FSF.org,where there are some very thought provoking essays. And if you'd like additional discussion (and possibly alternative views), also try Wikipedia, at least as a starting point.
At root, according to the FSF website, is that “Free software is a matter of freedom, not price”1, and they reference the “4 freedoms of software”2. Generally, the arguments given for use of 'Un*x-like' software (with it's inherent portability advantages), are something more akin to those from a competing organization: Open Source Initiative. Their view is that:
“Open source is a development method for software that harnesses the power of distributed peer review and transparency of process. The promise of open source is better quality, higher reliability, more flexibility, lower cost, and an end to predatory vendor lock-in.”3
From a functional standpoint, I see OSI's view as a logical subset of FSF's. For the purposes of this talk, OSI's definition is sufficient. It's also useful in explaining to your management – or their legal staff - why open source should be considered for production use in your organization!
Where to find potential applications, and how do you get them to run on your system?
Open Source applications for MPE come in one of three flavors:
Ported with native compiled binaries – and packaged so as to be ready to 'load-and-go',
Ported, with or without native compiled binaries (for whatever reason), that also isn't packaged in a 'MPE' friendly way, and
Not ported yet. ;-)
So, just how do you setup a environment for a non-prepackaged application, and why does that make a difference? Well – it comes down to what 'portability' means, really. To 'port’ is the act of converting software to run on a differing platform. Sometimes it’s a simple as just recompiling. Other times, it’s not so easy….
In 3000-land, we're spoiled, to be perfectly honest! We're used to software being EXCEPTIONALLY upwards compatible, and even across architectures (even multiple generations of architectures!) – at a 'binary' level, for both executables AND data bases and files. In Un*x land, which is where most portable software originates – portable means 'at the source code level' – mostly. The reason I say 'mostly', is that there are variations from one version or type of Un*x to the next. In order to accommodate that, developers had to come up with ways to handle the differences, such that when an application build was done, the platform it was on would be recognized. By use of conditional compile switches, that were automatically set by the 'configuration' scripts – or tools such as 'autoconf' from the GNU library, the correct bits of code would be used at compile time. This is really where the magic of 'porting' occurs – where the person or persons that actually do port the software, make the appropriate changes for each target platform. Ideally, those changes are passed back to the 'maintainers' of the application, for official integration – so that future updates to the application will compile 'out of the box'.
Where do you find open source applications already ported to the 3000? Many people don't realize it, but they've probably already got quite a number of ported applications – already on their servers! The last several releases of MPE have included Syslog, Bind, Samba, Apache, and even Sendmail. These are all good examples of popular and very useful open source packages. However, because their use hasn't been actively championed, nor user success stories with them heralded - some sites are barely aware that they're even there. It's a shame, as they're fully supported by HP!
So what do you do, if you need something that's NOT already supplied by HP? Fortunately, there are quite a few packages, tools, utilities, and even programming languages available – already ported by others in our community. Some of them are distributed with 'precompiled binaries' (ready to use – just install the files to your server either via native restore, or via 'tar'), while others my require recompiling (and more setup before you can do that...).
Ok, but where do you look to find these packages? Unfortunately, the list is many and varied. For applications that have precompiled binaries – the top 3 places to look are: the 'software' section of HP's “Jazz” server (jazz.external.hp.com/src/), the website of Mark Bixby (of HP fame) at www.bixby.org/mark/, or the website of Lars Appel (also – formerly of HP) at www.editcorp.com/Personal/Lars_Appel/index.html. Both Mark & Lars websites offer both software – as well as white papers about HOW to do porting, should you want to give it a try yourself. Mark's power point presentation of August 2001 on 'Open source development on MPE' identifies the kind of things that port well, and what DOESN'T port so well – as well as the things that fall somewhere in between. He also points out where to find help (the 3000-L is your Friend!), and reinforces that since most open source software is DESIGNED to be easy to port, you really don't need to be a Un*x or 'C' guru.
If what you're looking for can't be found at one of these places – try doing a search on the 3000-L, as there are a few packages that have been ported by other individuals, but that are not on the above listed sites – and there are few, or in some cases – no other links to them.
Ok – maybe so you still haven't found what you're looking for. Now what? Well, you can either hire someone to help you find, and if possible – assist in porting an application or tool for you, or – you could do it yourself. In order to avoid duplication of effort, I would suggest making a post to the 3000-L, asking if anyone else has ported that particular package – and if the application's website has a support forum, ask the question there as well.
Prior to late September of 2007, I'd have had a much harder job explaining how to setup a porting environment on your 3000, but thanks to the efforts of Vidya Sagar of HP, a white paper called 'Porting Open Source applications on MPE/iX – A case Study of Samba-3.0.22' – has done the work for me. It does a very thorough job of describing the limitations/peculiarities of the POSIX implementation on MPE, as well as some ways to get around them. It also describes setup and installation of all the various bits and pieces necessary to do porting and/or development work with POSIX compliant software. This paper can be found on Jazz, on the Samba download page at jazz.external.hp.com/src/samba/.
Where can you find what open source applications that may be portable? There are 3 top sites: gnu.org, sourceforge.net, and freshmeat.net. Gnu.org is the home of the FSF's O/S, compiler, tool, and utilities software. Sourceforge.net is both a searchable library of applications – as well as a 'home' for hosting many open source projects. Freshmeat.net is primarily an 'announcement' site. Prepare to be astounded at the volume of software that is released/updated daily at that site. When perusing sourceforge.net or freshmeat.net for a candidate package – look for command-line or server applications that are 'POSIX' and failing that, Un*x applications – and that don't have a 'gui' interface, given that X-Windows never really took off on the 3000. An exception to the 'POSIX' or 'Un*x' rule might be scripting language based applications written in php, perl, or java – but beware of external dependencies for packages that are not yet available on the 3000. And I'll save you some time: Don't even bother trying to look for 'MPE' or '3000'. ;-)
To quote a reasonably famous comedian, who's also a native son of Texas: “And I told you all that, so I could tell you this:”
Once upon a time… there was a specialty retailing company with a legacy (that is: old, but reliable) system, that needed a secure communication solution for their Point-of-Sale polling. Point-of-Sale is trade-lingo for the cash-registers and supporting equipment. It's also commonly abbreviated as 'POS', for reasons that become obvious if you have to work with them very often.
This legacy system happened to be 'home-grown' over a number of decades, but it could just as easily been one that was purchased. The POS was a system purchased from the industry leader in that particular type of equipment, and who happened to be headquartered close enough that representatives of the companies product team management were regularly visiting.
What exactly drove this need for new functionality? In short, a business partner had changed the rules on them. Not a bad change, but one the company really wasn't prepared for. It was called “PCI” compliance. PCI is short for 'Payment Card Industry”, and the consequences for non-compliance were frightful. Companies that failed to comply were not allowed to accept credit-cards for payment, or would be liable for large fines, if they only partly complied.
All in all, something to avoid. As it turns out, use of credit as payment is so large a percentage of this company's receivables – that it was deemed impossible to operate without it. A solution HAD to be found.
So late in 2005, I was tasked with finding a solution to this problem, with a January 2006 deadline for compliance! (and yes, an earlier start could and should have occurred – but that's fodder for yet another talk...).
In all truthfulness, the legacy system was on it's way to being replaced – but the first site to be converted was not due to 'go live' until late 2006! We conferred with the replacement application's vendor – and found that there was no possible way to alter the deadline THAT much, let alone in that direction! Besides, they were still in the 'blueprint' phase. The 'realization' phase wouldn't start for at least another year! (which might, for some of you, give some clues as to who the vendor is...)
How to find a solution? As it turns out, apparently what we were struggling with wasn't that uncommon, and because of that – the consultancy that was hired by Visa to do compliance verification was also tasked with assisting us in 'getting from here to there'. In all fairness to our legacy application – pretty sweeping changes were required company-wide in order to achieve PCI compliancy: from network structure, to methods use to access POS systems (and who was allowed to do so), duration of retention of certain types of data – the list goes on and on.
The solution chosen for the POS end (which happened to be SCO Unix based), was a version of a package called OpenSSH – which is an open source implementation of a 'secure shell' which supported, among other things: public-key exchange authentication, encrypted communication, and secure file transfers. This choice was made independent of whatever solution might end up being used for the retail 'back-end', as a secure method of communication was required in order to satisfy transmission of credit-card data from the POS systems to the payment processing company.
As part of the solutions review process - OpenSSH was suggested for the 3000 end of things as well, and after a few searches on the 3000-L, I was surprised and delighted to find that a version compatible with that chosen for use on the POS had already been ported by Ken Hirsh. In case you're wondering, the retailer didn't do any explicit handling of sensitive data on the 3000's, but as part of the PCI compliance rules, any system that communicated with a system that did – had to use a secure communication protocol with restricted access. Also, any systems that had authenticated communications between them had to have different user-id's used than those that a 'end-user' would use, and the passwords for them had to change – at a minimum – once every 30 days. Given how many POS systems were being polled for data, having a authentication system that required changing all the passwords every 30days would have made for serious administration issues. That, and the support staff wasn't allowed to know what that password was.... making changing that password all the harder. The plan was to use 'public/private key-exchange authentication' (called 'KEX', for short), because that relieved the operations staff of needing to know what the POS system's passwords were – as they weren't used by the application anymore! The 'application user' login on the POS system could have it's password changed (every 30 days, as specified), but the 'key-exchange' authentication would continue to work.
For those unfamiliar with OpenSSH, and that have an interest in it – I'd suggest visiting the OpenSSH.org website. You'll find that OpenSSH is developed on the OpenBSD platform, and then after a particular version is released for OpenBSD – a separate team of programmers makes the changes for what are called the 'portable' versions of OpenSSH. The version I originally implemented for this retailer was v3.7.1p2 – which is the 'second patch' of the portable version of OpenBSD's 3.7.1 OpenSSH, and as of this writing (spring 2008) – is still the newest version available for the 3000.
OpenSSH, for the un-initiated, requires the POSIX shell environment for the MPE version. This is a limitation shared by any application compiled with the GNU C/C++ compiler. That is to say, it won’t run at the MPEiX CI. OpenSSH also has a number of other dependencies, including OpenSSL (available from HP's Jazz website), Perl (also available from Jazz), several specific Perl 'add-in' modules (available from CPAN – the comprehensive Perl Archive Network), the GNU Compiler Tools (C & C++ compilers – again – available from Jazz) to compile Perl, the Perl add-in modules, and OpenSSH itself.
As it turns out, in order to provide protection against a particular type of security threat call 'man-in-the-middle' attacks - OpenSSH prefers DNS vs. hard-coded IP addresses for connections. This is because it actually stores the DNS name AND the IP address when first making an authenticated connection – and will refuse to re-connect without permission – if that pair (name & IP) changes. This made for an excellent excuse to do something the retailer had never gotten around to: enabling name resolution on all their 3000's. To check this, there's a handy script written by Jeff Vance (now retired from HP, and working at QSS) called dnscheck.sh – and it too, is available from Jazz. It is a truly handy script – as it walks you step by step through getting DNS working properly, including specific information on what needs to be changed when it finds something wrong.
Openssh also requires a Pseudo Random Number Generator (PRNG) – to provide a stream of random numbers. These random numbers should have certain important properties to ensure system security:
It should be impossible for an outsider to predict the output of the random number generator even with knowledge of previous output.
The generated numbers should not have repeating patterns which means the PRNG algorithm should, at minimum, have a very long cycle length.
A PRNG is normally just an algorithm where the same initial starting values will yield the same sequence of outputs. On a multiuser operating system there are many sources which allow seeding the PRNG with random data. The OpenBSD kernel uses the mouse interrupt timing, network data interrupt latency, inter-keypress timing and disk IO information to fill an entropy pool. In MPE, the RAND compiler library routines were used. This was unfortunate – as they're really not very random. In fact, they're very predictable. I'll be writing a piece for the 3000Newswire that demonstrates that fact in the coming months. Because of this – a different PRNG is required.
As it turns out, MPE isn't the only system with this problem, so an alternative is available. Ken Hirsh packaged in a tool called the “Entropy Gathering Daemon” - or EGD for short. It's a Perl program that searches various sources of pseudo-random data on your server – and 'stirs them up' in such a way as to make for a fairly good source of random numbers. And if cryptology needs one thing (besides LOTS of cpu cycles), it's a good source of random numbers.
The PCI consultants were a mixed blessing – in that they helped with some setup & permissions/ownerships related issues – but they also insisted on using the newer v1.2.3 zlib, due to a ‘reported weakness’ (the 'double-inflate' flaw) in the v1.1.3 zlib that ships with MPE’s Posix. Zlib, is a compression library, and is used by OpenSSH whenever a connection with compression is requested. In the copy of this talk that will be posted on the GHRUG website, or available in the supplementary materials available to any of the attendees that want it – I'll provide links that document the weakness, and a pretty cool Perl script that can identify the versions of either static bound copies of zlib, or whatever the dynamic versions on your server might be.
Ok – so that all sounded like a lot of work, getting all this new software installed and working. Well, again – there's now another resource that makes doing this much easier – because it provides 'follow the dots' instructions. They're found on the Beechglen Development website, and are really helpful.
Now comes the fun part – integrating this into the existing application.
As you might recall, the whole reason for doing this exercise – was to implement 'secure' communications between the POS systems, and the 3000 that uses the data from them. The software package this retailer uses is called 'POS/3000' from STRSoftware. This package had originally been set up to do bisync, then later serial 'serial' (as in – RS-232) polling of the POS systems – by emulating a login over a serial-port, and executing particular scripts/commands to collect up and retrieve the required data in order to properly capture stock-movement data. A later enhancement was to replace the 'serial' connection with a network connection. In a nutshell, this was done using the 'remote shell' (remsh) in order to remotely execute commands, and ftp in order to put or get files on the POS system. This is when we found out something quite unfortunate.
While Openssh's sftp (secure ftp) functionality was directly equivalent to the ftp commands being used for network based polling, the ssh 'secure shell' equivalent to the 'remsh' remote shell just didn't work properly in the MPE port. As it turns out, the problem was more of one where it was difficult to get the ssh command-line to reliably return results and status values back – when it executed commands remotely. This wasn't due to the work Ken did in porting this, but rather was due to a known issue with the POSIX environment on MPE. This was a REAL problem for this project, as it prevented doing a direct ftp-->sftp, remsh-->ssh command 'swap out'. To make matters worse, while sftp allows limited access to commands on the remote system – it does NOT allow executing remote scripts or commands besides 'ls', 'chmod', 'chown', or 'rm' (or the standard ftp 'put' and 'get' commands).
We went through several possible work-arounds but ended up with a fairly simple 'cron' script resident on the POS, that would execute specifically named scripts for us. It would be run once a minute, 24x7. We originally looked into making a light weight 'daemon' process to remotely execute commands for us – in the believe that it would consume fewer system resources than the 'scripted' alternative. After reviewing time-to-implement, and actual overhead of the script to be used, we found something that was a bit suprising. By first checking for presence of a key script, and if not present - just stop until the next minute, we could reduce the actual load on the POS to nearly unmeasurable levels. What we had forgotten was that process creation is a much lower overhead activity on most varieties of Un*x than it is in MPE-Land.
There are drawbacks of a 'non-interactive' communication protocol, however. One of these was that there was no way of knowing for sure how things were progressing. The prior interface design had used 'remsh' – which would 'block' (wait) until the remote command finished. Our 'remote' scripting method couldn't do that. We also ran into issues with how to communicate back to POS3000 (the polling control software), how the poll had gone – and what problems, if any, had occurred. This was compounded by having a variable number of commands that could be executed on a particular poll-cycle.
We tried using '!setvar' from within the POSIX shell, but found that this didn't work properly. Turns out that this is a byproduct of the design of certain MPE commands, and not directly a problem with the POSIX shell, per se. Because the shell expects that all commands can do CIOR (command I/O redirection) – it cannot work properly with commands that cannot. As it turns out 'setvar' – which we were trying to use to communicate back to POS3000 various 'I got this far' sort of messages – via the MPE-CI suffers from just this issue. In essence – there appeared to be no way to pass information back from 'POSIX-Land' to the MPE-CI. After much wrangling with it, we found that by 'wrapping' the offending commands in a command file (which can do CIOR), we could 'fix' the problem. As it turns out – there are a number of commands that have this issue – which is documented in the supplemental materials for this talk, and will be the subject of another article in the future.
As it turns out, it was also necessary to 'tweak' how much data a poll-cycle could pull back at a time – and based on that, how long to wait for that data to be collected. We also found that it was necessary to find a way to manage the 'throttling' (how many polls of POS systems could be concurrently running), due to limitations of the EGD's ability to generate random numbers in sufficient volume This was in spite of some modifications I made – adding additional data sources and setting the 'high/low' limits on the entropy pool. There was also limitations in how quickly EGD could accept socket connections (also a known limitation on other platforms!). It was also necessary to make sure that the polls would automatically 'retry' if there were any failures – reporting the reason and – when possible, trying to automatically correct any problems found along the way.
What did they end up with? A secure polling solution that has worked reliably for over 2.5 years now, handling well over $800m in in sales (so far), which was contained in hundreds of Gb of polled data. Was it worth the effort? Yes I'd say so. Especially given that the retail system replacement project that was supposed to begin the 'rollout' phase by late 2006 ended up being over 9 months late, and it's exact completion date is still uncertain.
In the mean-time, OpenSSH has provided a secure and reliable method to communicate with their POS systems – and allowed achieving PCI compliancy.
Before I wrap this up and answer any questions – I'd like to thank all those who's efforts have made this possible, including Ben Bruno of STRSofware – for his exceptional work in modifying POS3000 to support secure network polling. Thanks to Ken Hirsh for his port of OpenSSH – without which, this would have been a presentation on 'How I spent my summer porting OpenSSH', and thanks to Mark Klein for his port of the GNU toolset, without which – none of this would have been possible.
http://www.ciac.org/ciac/bulletins/m-062.shtml <—ciac’s advisory page
http://www.gzip.org/zlib/advisory-2002-03-11.txt <—zlib’s advisory page
http://CERT.Uni-Stuttgart.DE/files/fw/find-zlib <—find what zlib’s on your system…
put in /usr/local/bin; chmod it, and in shell, cd to /usr/local/bin, then invoke as: perl find-zlib -v *
(will give list of what uses zlib/libz & what versions)
RESOURCES: (not in order of importance, nor should any endorsement be inferred by inclusion/exclusion or relative placement…)
Free Software Foundation – FSF.org
GNU.org – GNU's Not Unix (home of many open source applications, many of which are already ported to MPE)
OpenBSD.org – basis for many other open source applications, most notably for this paper: OpenSSH.
3000-L mailing list <--mirrors of each other--> comp.sys.hp.mpe newsgroup
Jazz – jazz.external.hp.com – home of most 'pre-packaged' open source applications, except for those already included as part of FOS.
(previously maintained by Jeff Vance of HP, now retired – presently working at Quintessential School Systems (QSS) in San Mateo)
Mark Bixby’s ports: www.bixby.org/mark
(private work of Mark Bixby of HP, also now retired – but the foundation of many other ports he's also published a number of useful presentations and guide for those who want to adventure into open source land to do their own porting projects. Mark is presently working at Quintessential School Systems (QSS) in San Mateo as well)
Lars Appel’s ports: http://www.editcorp.com/personal/lars_appel/
(english homepage: http://www.larsappel.de/1.html – an impressive list of ports...)
3k Associates: www.3k.com <- native email solutions & much freeware
Allegro Assoc: www.allegro.com <- native tools for 3000 & 9000, & much freeware
Beechglen Development: www.beechglen.com
( whitepapers, freeware, consulting, & paid support – and ‘quick’ how-tow’s on setting up basic ssh/sftp/scp, using Ken Hirsh’s port. (look under 3000 whitepapers, under ‘security’ and sftp setup)
O’Reilly publisher’s of MANY useful tomes – in this project’s case:
OpenSSH (for this one, be sure to get the 2nd edition, or later).
General porting: Porting Unix Software by Lehey – very useful to help learn and understand the various – 'quirks' that may present themselves, depending upon what version of Un*x the package is coming from, or going to. Also, currently out of print (of course).
www.OpenSSH.org <-- official home of the OpenSSH package
www.OpenSSL.org <-- official home of the OpenSSL package
https://www.pcisecuritystandards.org/index.htm <—PCI Security Standards Council