AMD Processors
Decrease font size
Increase font size
Topic Title: Does Opteron do UNIX?
Topic Summary:
Created On: 12/02/2003 10:02 PM
Status: Read Only
Linear : Threading : Single : Branch
<< 1 2 3 4 5 6 7 Previous Next Last unread
Search Topic Search Topic
Topic Tools Topic Tools
View similar topics View similar topics
View topic in raw text format. Print this topic.
 12/04/2003 07:08 PM
User is offline View Users Profile Print this message

Author Icon
Blackhawk
Member

Posts: 81
Joined: 12/02/2003

QUOTE (jes @ Dec 4 2003, 03:54 PM)Hmmm, I'm guessing you mean the S2885 Which is the K8W?
I looked over at Tyan's site, and it's the Thunder K8W (S2885) that's got my eye.

If I can dual boot, there won't be any problem. Going forward, I still expect 99% of my work to be on Windows. Unless, of course, *-ux just really makes a fanatic out of me!

(Must remember, Windows pays the bills around here. Must remember, Windows buys the beans. Maybe have to figure out a way for *-ux to pay some bills and buy some beans.... )
 12/04/2003 08:15 PM
User is offline View Users Profile Print this message

Author Icon
deuxway
Member

Posts: 193
Joined: 10/08/2003

QUOTE (Blackhawk @ Dec 4 2003, 06:30 PM)deuxway and jes:

You two are magnificant!  Thanks for all the help, but it'll take me the rest of the year to follow up on all the tips, tricks, and hints you've given me!  I really appreciate the help! 
Blackhawk, I'm more than happy to help, but I think Jes's offer was the winner

I'm a tad green with envy (and Houston mold) over your 24*8* machine.

If you run Cygwin on Windows on it, you'll likely get a console app ported to Cygwin's Linux API's pretty quickly. Then dual booting will give you easy acces to the *same source* from both environments (if you use FAT partitions). You'll probably find you can have a common makefile. Very nice.
 12/04/2003 08:38 PM
User is offline View Users Profile Print this message

Author Icon
Blackhawk
Member

Posts: 81
Joined: 12/02/2003

QUOTE (deuxway @ Dec 4 2003, 05:15 PM)I'm a tad green with envy (and Houston mold) over your 24*8* machine.

If you run Cygwin on Windows on it, you'll likely get a console app ported to Cygwin's Linux API's pretty quickly. Then dual booting will give you easy acces to the *same source* from both environments (if you use FAT partitions). You'll probably find you can have a common makefile. Very nice.
Don't turn green yet, 'cause I don't have it ... yet. I'm still dithering about the components, but when I decide, it's a go!

Too bad Linux (apparently) doesn't run on NTFS partitions. Guess I'll have to figure out how to live with that.
 12/04/2003 08:58 PM
User is offline View Users Profile Print this message

Author Icon
jes
Senior Member

Posts: 1134
Joined: 10/22/2003

QUOTE (Blackhawk)
Too bad Linux (apparently) doesn't run on NTFS partitions.  Guess I'll have to figure out how to live with that. 


Linux will quite happily read from an NTFS partition, just be sensible when you partition your hard drive(s) and you shouldn't have any problems.

-------------------------
The opinions expressed above do not represent those of Advanced Micro Devices or any of their affiliates.
http://www.shellprompt.net
Unix & Oracle Web Hosting Provider powered by AMD Opterons
 12/04/2003 09:55 PM
User is offline View Users Profile Print this message

Author Icon
Blackhawk
Member

Posts: 81
Joined: 12/02/2003

That's half the problem solved! Does Linux also write to NTFS partitions?
 12/04/2003 11:09 PM
User is offline View Users Profile Print this message

Author Icon
deuxway
Member

Posts: 193
Joined: 10/08/2003

I didn't know of a Linux that did support writing NTFS, but I just google'd this Linux NTFS project which released last month.

I also did a search at SuSE' ">http://www.suse.com/cgi-bin/htsearch.cgi for "NTFS". There seems to be a lot of AMD64 references, so you may be in luck. I've never used this.

I know a FAT/FAT32 partition isn't ideal, but I'm not sure why it wouldn't be good enough. It's only going to need your application source, binaries, and tests. You could keep your source code revisions, and production data safe somewhere else.

I've been chatting to a development chum, at Berryhill Tamales, yum, about your problem, and he had some more thoughts.

If you want to know how much Windows-specific stuff you've got, try:
a. commenting out stdfx.h and windows.h, and see how many *different* API's show up.
b. getting a link map from VC++ linker, and look at the DLL's, if you post a list, we could probably say which ones are problems.
c. running exehdr (we think) on your executable. It will do the same as getting a map from VC++. Again look at the DLL's to see if there are problems.

His other thought, which is even easier than Cygwin, is to download Borland C++BuilderX' ">http://www.borland.com/product...ownload_cbuilderx.html Personal edition (you don't need the fancy stuff in Enterprise for you console app, so keep things simple). You can get a 30 (or maybe 90) day trial for free, which should be enough time to evaluate the scale of the job, and maybe do it.

C++BuilderX runs on Windows, Linux and Solaris. So the same IDE and makefiles will work on each platform.

It comes with gcc. So, if you do the trick with the header files, on Windows, you'll be able to get the thing Linux-API clean on Windows.

Borland C++BuilderX lets you plug in other compilers, so you can plug in a gcc that supports AMD64 (we don't know the one it ships with doesn't have AMD64, I couldn't tell) Here's a C++BuilderX datasheet.' ">http://www.borland.com/cbuilderx/pdf/cbx_techview.pdf

{ Adults only - His other suggestion is he is willing to take contracts. So he would be willing to do the port. I could put you in contact with him. He's very good. He has developed commercial mission critical (oil and chemical industry) products, as well as a well-used Ant plugin and w3c DOM test suit. Email me if you want to follow up. }

Watch out for a free Borland C++BuilderX CD in magazines. I think I got mine with Dr Dobbs. It'll save you the download time if you have a slow link.

Anyway, we hope his helps.
 12/04/2003 11:54 PM
User is offline View Users Profile Print this message

Author Icon
Blackhawk
Member

Posts: 81
Joined: 12/02/2003

Thanks, deuxway. Those are also great tips, and things look encouraging!

The main impetus to have a dual boot Win/*nix machine running on NTFS partitions is for validation testing and generation of marketing hype. This app takes about 50 hours to crunch one set of test data on a 1.7 GHz AthlonXP, with the slowness primarily due to Windows file handling limitations. The theory is that 32 bit UNIX will cut that drastically and 64 bit UNIX will make the test positively fly! But to do it apples to apples, the hardware has to be similar, and ideally the tests should be run on the same hardware. After some reliable numbers are generated, they can be extrapolated for when the app's running on some serious hardware. NTFS may not be the best for UNIX, but it is for Windows, and that's what I'm stuck with testing against. (Is there something better for UNIX...?)

I'm just astounded with all the tightly focused help you, your chum, and jes have provided! It's nice to be reminded every once in a while of why I like this business -- some great folks are in it!
 12/05/2003 12:26 AM
User is offline View Users Profile Print this message

Author Icon
deuxway
Member

Posts: 193
Joined: 10/08/2003

Blackhawk, good news I think. Maybe you haven't considered how dual booting usually works.

You have both NTFS and *nix partitions on the same machine. They may share a drive, or spread over many drives, or share RAID arrays. Doesn't much matter. You don't lay down a *nix OS on a Windows file system (e.g. NTFS) or vice versa.

That's precisley what I have on this little Shuttle desktop (P4) machine, which has a single IDE drive. It takes a little extra planning, and some upfront partitioning (30 minutes work), but it's all okay after that. You might want to buy a copy of Partition Magic, I like the graphical partition maps, though I did a Red Hat machine with Norton ghost tools (gdisk I think). But you could use the partition tools that come with Linux.

I have a couple of Windows NTFS partitions (one for OS & applications, and the other for work), a couple of native Linux (SuSE) partitions (and swap is separate on *nix), and a FAT32 partition. If you want more details, I can post.

I can read and write the FAT32 partition from either OS.

When you run Windows you'll have almost everything it wants on NTFS.
When you run *nix, you'll have almost everything for it on native file systems. IMHO that's the best way to get a fare comparison.

The FAT32 partition is there so that you can share source, test data etc. easily.

With SuSE, all of your Windows partitions can be made available. My SuSE only lets me read NTFS, but I can read/write FAT/FAT32, and the links I gave may be better.

When you do your tests and demo's they will be running on pukka, native, high-performance file systems.

What do you mean by "Windows file handling limitations"?

Do you mean the application is file-system bound? Have you profiled it to see where it spends most of its time, or tried to "simulate" file system I/O and got blazing performance? I'm raising this as I/O bound systems will get more benefit from a fast disk subsystem than fast CPU's or even a 64bit OS. It is traditionally true that UNIX systems have very good I/O throughput, but part of that was due to price where those systems got fast SCSI/RAID disk subsystems.

Depending on your application, you may want to use some specialised ways to do I/O.

My chum and I are pretty good at this, so again, your welcome for help.
When we get too busy, we'll stop answering
 12/05/2003 06:36 AM
User is offline View Users Profile Print this message

Author Icon
jes
Senior Member

Posts: 1134
Joined: 10/22/2003

Just to throw more ideas into the mix...(you can never have too many options before you start off!.

Another alternative to partitioning between Windows and Linux/BSD might be to investigate VMWare. If you're not already aware of it this allows you to have a "virtual operating system" within a currently running one. So say (for the sake of argument) you have your machine already setup running Windows, you fire up VMWare and configure it so that you can install Linux, then from within your Windows environment you get a full installation of Linux which can be as isolated as you like (i.e. you can restrict or grant access to the hosts network interfaces, restrict or grant access to the hosts harddisks etc).

I'm not saying this is a great *final* solution, but while your testing then it a quick way to get up and running. I've used VMWare for quite a while now, and I'd have to put it in my list of top 5 applications I've *ever* used.

Hmmmm.....one thing I've just thought of though, if you run a 32bit Windows host on an AMD64 machine and then create a 64bit linux machine within it? Now that'd would be interesting!

Anyway....it's an option.

Just out of interest, regarding the NTFS side, I've just checked my kernel configuration on my Opteron box and found the following declarations in the kernel .config file

# CONFIG_NTFS_FS is not set
# CONFIG_NTFS_RW is not set

Obviously the first one is to allow NTFS Filesystem access, the second one looks very interesting....I can't see it described as "experimental" anywhere, but I haven't digged around that much.

As Deuxway points out though, it would probably be simpler and less risk to have a seperate FAT partition or something that you just used to share data between the two OS's.

Hmmm, regarding the rewriting of your App to 64bit/Linux. I've always found that this is the best time not only to just do a code translation but also to re-investigate the algorithms that you're using. As an example, I had a piece of 32bit code that took around 7 hours to run, rewriting that as 64bit code took it down to just over 4 hours (which was pretty impressive), re-working the algorithms brought the time down to under an hour (not because I'd discovered some secret great new algorithm just that usually you don't code things the best way you could have first time around).

You might be disappointed if you're relying purely on a change of OS to speedup the runtime of your app, you should be looking to *optimize* your app for the architecture as much as possible, sure you'll get great performance from the Opterons....but you'll get amazing performance if you target your code carefully.

Again as another example (which I've posted elsewhere in these forums, so excuse me for repeating it). We had some code that runs literally trillions (seriously) of calculations on various atoms, we ran the (tweaked and optimized) code on my dual 244 Opterons (1.8Ghz) and another dual 2.8Ghz Xeon machine. Despite the Xeon having a 2 day headstart, the Opteron completed the run a week before the Xeon did (Opteron total run time a week, Xeon total just over 2 weeks).

Oh and to reinterate Deuxways' very useful point, profile your code! Because the bottlenecks may not be where you *imagine* them to be...you could get a big surprise. Generally (as a BIG generalisation), in the majority of code, 80% of the time is spent running 20% of the code. What this means in effect is that small optimizations in the right place can have a HUGE impact in performance. That is something you can only determine with proper profiling.

Hope this helps!

-------------------------
The opinions expressed above do not represent those of Advanced Micro Devices or any of their affiliates.
http://www.shellprompt.net
Unix & Oracle Web Hosting Provider powered by AMD Opterons
 12/05/2003 12:14 PM
User is offline View Users Profile Print this message

Author Icon
Blackhawk
Member

Posts: 81
Joined: 12/02/2003

A MUCH simpler way to get an idea of the Windows file handling limitations than what I'm going to say below is to seed two folders, one with 5,000 files and another with 100,000. Using Windows Explorer, double click on each folder and measure the time it takes to display the contents. Their ratio is FAR worse than 20:1.

A hard way you can see the Windows file handling limitations is by setting up a relatively simple test. Make an app that lists all files in a folder, then opens each one, makes a copy of it under a reusable name, opens and reads the last 500 bytes of it, writes a new file containing those bytes named as a derivitive of the original file. When that's completed, open the next file, copy it under the reusable name (overwrite), and continue the process. At the end of the process, delete each of the derivitive files and the last copy with the reusable name leaving the folder contents being only the original, unmodified files. Load the folder with 5,000 clones of a single 25k file. Run the app and note the time to complete. (Defrag as it suits you.) To expose the limitations, increase the number of clones to 100,000, and run the app again. You'd link the times would scale. So did I! On one of the original tests with real data, the first phase took 30 minutes and the second took 50 hours! After several iterations of optimization, the first phase took 15 minutes and the second about 23 hours. Certainly not 20:1 as I expected -- more like 100:1. However, it must be noted that the real data tests involved actually processing the data, which was severely and adversely affected by the CPU time devoted to I/O. Without processing, the test would be confined to testing I/O without being hamstrung by actually having to do something useful.

Even sloughing the copies to another folder doesn't "cure" the problem, which is apparently the MFT being a single hierarchy incapable of efficiently dealing with the increased complexity of an increasing number of files. It seems that the I/O speed in inversely proportional to the number of files the MFT has to deal with, which, of course, means that there's a magic number of files for each system that will bring the system to an apparent halt. "A" cure would be a multiple hierarchy for the MFT so that each "layer" only dealt with an "efficient" number of files. Alas, MS apparently hasn't been concerned with us folks who have a need to deal with really large numbers of files.

My app is definitely I/O bound, and about every known and imagined trick to minimize the time required has been implemented and exhaustively tested. That doesn't mean there aren't more things to try, but seemingly the next best course of action is to avoid I/O by being able to access large amounts of memory (64-bit app) and use some of the reputed data crunching talents of UNIX. Of course, most of our customers run UNIX already, and that's the prime motivation for porting it. Anticipated order-of-magnitude performance improvements are just gravy!

My opinions about the limitations of Windows file handling were formed by testing and deductions based on the results. If there's any way to streamline it with the current hardware running 32 bit Windows, I don't know what it is!

Okay, that takes care of that question... for now!

I've used PartitionMagic for years, and I can't imagine doing without it. The current version (8) handles up to 160GB partitions (up from 137GB in version 7), but I really don't see why it has that limitation.

The other great tips in the last two responses will take some research, but they remind me of my brother-in-law the first time he got into the family holiday penny-ante poker game decades ago. He'd never played poker before, and a few hours into the game he uncharacterisitically kept raising at every opportunity. When the showdown came, he said "I don't know what I've got, but I think it's good!" and proceeded to lay down a natural Royal Flush. That's the way I feel with all these responses, and thanks again beyond any words I can say!
 12/05/2003 12:35 PM
User is offline View Users Profile Print this message

Author Icon
jes
Senior Member

Posts: 1134
Joined: 10/22/2003

Hmmmm, yes I've come across the "lots of files" phenomenon before, although not to the extent that you've experienced.

At the risk of wandering off topic slightly, could you give a simple and brief description of what your app actually does? I don't mean in depth algorithms etc, but an overview like "it opens a file, processes the contents of that file, writes out five new files, each of those five files is then processed in turn" etc.

If you are writing/reading as many files as you seem to be suggesting then I'm not surprised your app is long-running. Would it not be possible to use one file, and index within that file for the data you need? Is every file required to be output (i.e. is *every* file used? Or would it be possible to hold the file data in memory (by using Memory Streams) and then to only write out the ones that you needed at the end?

Apologies if you think you've exausted every avenue with your app, but sometimes a fresh pair of eyes spots something (I've lost count of the number of times someone has peeped over my shoulder and said "have you tried?").

-------------------------
The opinions expressed above do not represent those of Advanced Micro Devices or any of their affiliates.
http://www.shellprompt.net
Unix & Oracle Web Hosting Provider powered by AMD Opterons
 12/05/2003 01:27 PM
User is offline View Users Profile Print this message

Author Icon
deuxway
Member

Posts: 193
Joined: 10/08/2003

Jes, outstanding point. VMWare is the "mutts nuts", as they say.

Blackhawk, you'll probably want VMware' ">http://www.vmware.com/products/ anyway if you ever decide to sell and support multiple OS's.

Just to add some colour to Jes's explanation for people unfamiliar with VMware ...

VMware allows you to create one or more OS images in files on a disk in the same file system. No partitioning needed, just ordinary disk files in the file system of the host OS. VMware runs any of those OS's under your normal boot OS as if the OS image were just an ordinary application. You can run any application imported into the OS image, so you can even do this to test out different versions of DLL's if your suffereng DLL he!!.

VMware runs most versions of Windows under the boot Windows, but it goes further too. You can boot Windows, then VMware can start *any* of many *different* Linux just by loading that OS image, as easy as running Office! You can do the converse too, you can boot Linux, and load Windows images. This looks spooky!

This is superb for debugging, testing and support. You create an OS file image of each major and minor version of an OS (NT, NT+SP1, NT+SP2, NT+SP3, and by induction ... :-), then run your automated testing on the versions that you sell and support on. If you get a bug report from a customer, investigate the problem on the version the customer is using. This is so much better than "well, my version of Windows is fine, it must be your version of Windows causing the problem".

It gets better.

You can *simultaneously* run several OS images on one machine, simulting multiple separate servers with only one box, even a single CPU machine. This may run slow, but it is brilliant for debugging and testing. One of my colleagues simulated a 7 server configuration, with only a laptop and a desktop.

Of course, it will run slow, but it will be invaluable if you decide the multiple OS route is the way to go. Get as much RAM as you can afford, it seems to help a lot.

The only 'Fly In The Ointment' (FITO) is VMware don't appear to have AMD64 support yet.

Another thought, sorry. You may already do this, but ...

You may want to get a mobile rack' ">http://www.axiontech.com/searc...&orginize=&coktype=nor for your machine, and put the OS on separate racks. It's just a way to get an exchangeable hard drive.

This is only practical for a small number of OS's, VMWare is better for production suport of a large number, but you won't have any performance hit with mobile racks, and it's easier to get beyond dual boot.

Excellent call on VMware Jes.
 12/05/2003 02:03 PM
User is offline View Users Profile Print this message

Author Icon
Blackhawk
Member

Posts: 81
Joined: 12/02/2003

Jes, it indexes a multiplicity of user owned files of any size (up to 2E9 bytes) without making permanent copies of or disturbing the originals. The theoretical maximum number of unique files it can address is also 2E9, but the actual limit under Windows is 1.4E8 files that are much smaller and disregarding the unacceptable time handicap and hardware storage limitations. (Hmmm... how big of a disk array would it take to store 2E9 files of 2E9 bytes disreagarding overhead...?)

VMWare sounds very interesting. It's on list to check out!

Thanks again!
 12/05/2003 02:16 PM
User is offline View Users Profile Print this message

Author Icon
jes
Senior Member

Posts: 1134
Joined: 10/22/2003

QUOTE (Blackhawk @ Dec 5 2003, 11:03 AM) Jes, it indexes a multiplicity of user owned files of any size (up to 2^9 bytes) without making permanent copies of or disturbing the originals. The theoretical maximum number of unique files it can address is also 2^9, but the actual limit under Windows is 1.4^8 files that are much smaller and disregarding the unacceptable time handicap and hardware storage limitations. (Hmmm... how big of a disk array would it take to store 2^9 files of 2^9 bytes disreagarding overhead...?)

VMWare sounds very interesting. It's on list to check out!

Thanks again!
Hmmmm, sure you have that right? 2^9 = 512 So 2^9 files of 2^9 bytes would be 512 files @ 512 bytes each = 256Kb, i.e. it would fit on a floppy disk.

-------------------------
The opinions expressed above do not represent those of Advanced Micro Devices or any of their affiliates.
http://www.shellprompt.net
Unix & Oracle Web Hosting Provider powered by AMD Opterons
 12/05/2003 03:09 PM
User is offline View Users Profile Print this message

Author Icon
Blackhawk
Member

Posts: 81
Joined: 12/02/2003

No, it must be that stupid pill I took this morning! It's 2E9 and 1.4E8.
 12/05/2003 03:19 PM
User is offline View Users Profile Print this message

Author Icon
deuxway
Member

Posts: 193
Joined: 10/08/2003

QUOTE (Blackhawk @ Dec 5 2003, 02:03 PM)Jes, it indexes a multiplicity of user owned files of any size (up to 2^9 bytes) without making permanent copies of or disturbing the originals.  The theoretical maximum number of unique files it can address is also 2^9, but the actual limit under Windows is 1.4^8 files that are much smaller and disregarding the unacceptable time handicap and hardware storage limitations.  (Hmmm... how big of a disk array would it take to store 2^9 files of 2^9 bytes disreagarding overhead...?)

VMWare sounds very interesting.  It's on list to check out!

Thanks again!
Jes, you got there before me again, Sweet Mesquite buffalo burger for lunch, my arteries love me.

Edit: ok 2x10^9 or 1.4 x 10^8. I get it now.

I would like to ask a question, again risking revisiting existing ground.

Why must there be a file/file? Using a database to hold your index you might only have one file / volume (disk). Then you won't be bound by the size of the available RAM, and can avoid the hassle of managaing the memory pool by hand. Your fragmentation should be smaller, so the temporary disk space while working will be smaller.

Something pretty fast, and free to try out is Berkeley DB' ">http://www.sleepycat.com/ by Sleepy Cat. It can be orders of magnitude faster than SQL for a lookup problem, and you could probably negotiate a sensible commercial license price if you like it (they aren't a huge vendor like ...).

I have no link with sleepy cat, though I briefly looked at using their product in a project in 2001. I found them helpful and professional.

There is a Berkeley DB book' ">http://www.amazon.com/exec/obi...8741?v=glance&n=507846 if you need some more info.

Another couple of questions. Can you indicate how big your data is? Is it a few bytes / file, or does it depend heavily on the size of the file to be indexed?

I'm asking because a really disgusting trick, which can work in some cases, is to store your information in file names. You don't even create files, on UNIX it might just be a lot of hard links to one shared file. Instead you are using the OS's directories as your index and storage. OS's often cache directories very well and have highly tuned code to search and manipulate them. It depends a lot on the amount of data you need.

Hope this helps.
 12/05/2003 05:12 PM
User is offline View Users Profile Print this message

Author Icon
Blackhawk
Member

Posts: 81
Joined: 12/02/2003

One reason for file/file is to enable resumption after a midprocess shutdown, deliberate or inadvertent. The app can just pick up where it was interrupted. Having been stuck babysitting touchy days-long processing jobs that had to be restarted from the beginning if any problem arose, I'm a big fan of interruptable processes. There are other reasons for file/file as well.

The RAM limitations come in from using MDBs during processing. (BTW, Microsoft's DLLs don't release their memory allocation until the calling app shuts down, so no matter how many times you compact the db, the memory usage just keeps climbing! A more efficient DB would be helpful, but you can say this for the MS solution: everybody's already got it! The app monitors db memory usage and shuts that module down when a ceiling is bumped, then cranks it up again and resumes where it left off. In fact, that's where the 140,000,000 average file limitation comes from -- the maximum size of the processing MDB. Of course, I'll have to come up with another db module in a UNIX ported version, but that's a known challenge.

The data size can be huge, and the extracted data files have to be easily located for subsequent processing so naming tricks are essentially precluded. The application's data handling capability does now (and will in the foreseeable future) exceed the capacity of any hardware it will run on.
 12/05/2003 09:18 PM
User is offline View Users Profile Print this message

Author Icon
deuxway
Member

Posts: 193
Joined: 10/08/2003

QUOTE One reason for file/file is to enable resumption after a midprocess shutdown, deliberate or inadvertent. The app can just pick up where it was interrupted. Having been stuck babysitting touchy days-long processing jobs that had to be restarted from the beginning if any problem arose, I'm a big fan of interruptable processes. There are other reasons for file/file as well.
I agree, I like interuptable processes too. But I am sad how popular files are when transactional databases can be so much faster.

Okay, so look at the version of Berkeley DB that supports process resumption. The off the shelf version is Berkeley DB Transactional Data Store' ">http://www.sleepycat.com/products/transactional.shtml, but if you only need a single reader/writer, you could probably get them to reduce the license price.

If the results files you are creating are smaller than a disk block, a database should win easily.

If you are processing in a 'stream' look closely at Berkeley DB as I think it can be optimised for that case (sorry to be vague, I'm in TX, and my Berkeley DB book is in MA) and should win easily.

Berkeley DB is available for free download.
I've cut a few points from their web site and included here:
QUOTE Includes complete source code.
Small footprint — less than 500 kilobytes.
Extremely configurable: application controls the memory, disk, and other resource requirements of the database library.
Easy-to-use APIs for applications written in C, C++, Java, Perl, Python, Tcl, PHP.
Stores data in the application's native representation, eliminating the need to translate into a foreign object or relational format.
Supports full transaction semantics, so that multiple changes can be applied or rolled back atomically.
Survives software and hardware failures without losing data.
...scales to handle terabytes of data ...

Have a look at Berkeley DB Features' ">http://www.sleepycat.com/products/featurelist.shtml.

QUOTE The RAM limitations come in from using MDBs during processing. (BTW, Microsoft's DLLs don't release their memory allocation until the calling app shuts down, so no matter how many times you compact the db, the memory usage just keeps climbing! A more efficient DB would be helpful, but you can say this for the MS solution: everybody's already got it! The app monitors db memory usage and shuts that module down when a ceiling is bumped, then cranks it up again and resumes where it left off. In fact, that's where the 140,000,000 average file limitation comes from -- the maximum size of the processing MDB.

Okay, so you can hold enough info on 140,000,000 files in RAM. So am I correct in assuming one benefit you see of 64bit will be to support bigger address spaces, and hence more files?

With Berkeley DB, you would never run out of RAM, it would manage the relationship between the RAM (its cache) and the disk file for you. More RAM should just go faster.

I'm still a bit unclear on the overall flow.

I think you are saying the application can do subsets of files, then another subset, and another etc.
So, it stores results into files so that it can cope with mid-processing failure (and restart), and because the intermediate results are too big to store in memory.

QUOTE The data size can be huge, and the extracted data files have to be easily located for subsequent processing so naming tricks are essentially precluded. The trick was to put the file name, and its data all into the disk file name. You would still be able to find your data files, but the same lookup would retrieve the data, and the OS would be managing the disk cache. It only works when the data file size is small (i.e. <100 bytes), and you can easily figure out the start of the key (file name).

Let's forget it, it is horrible!

Anyway, I still think something like Berkeley DB (there are alternatives too ) would let you do a very fast look up and retrieveal of your data. On UNIX mapping to file name could be very slick; you could use something tiny like the inode number (unique ID for a file within a UNIX volume, 'in my day' it was a 32 bit int retrieved by, for example, UNIX fstat (2)' ">http://dell5.ma.utexas.edu/cgi-bin/man-cgi?fstat+2) of the file as a key.

One problem with Berkeley Db may be 'gracefuly upgrading' when the whole problem fits in RAM. I'd have to think about that.

We'd probably have to have a private conversation for me to help more deeply; I don't want to cross your 'comfort zone'.

Anyway, I hope this has helped.
 12/05/2003 09:28 PM
User is offline View Users Profile Print this message

Author Icon
jes
Senior Member

Posts: 1134
Joined: 10/22/2003

Also, another point...one mistake I've made in the past is making a recoverable process too recoverable.

For example, if you're writing out the current state every minute, or every few seconds then the impact on the overall run time can be significant. It would almost be better to chose a larger recovery period, say every 30 minutes, or every hour....then at most you lose an hours work, but it drastically cuts down on the amount of file i/o you need to do. After all, is it better to have a process that runs for 7 hours but you lose at most 1 minutes work, or a process that runs for 3 hours, and at most you lose 30 minutes work? (figures plucked out of the air, purely at random!


-------------------------
The opinions expressed above do not represent those of Advanced Micro Devices or any of their affiliates.
http://www.shellprompt.net
Unix & Oracle Web Hosting Provider powered by AMD Opterons
 12/05/2003 09:43 PM
User is offline View Users Profile Print this message

Author Icon
deuxway
Member

Posts: 193
Joined: 10/08/2003

QUOTE (jes @ Dec 5 2003, 09:28 PM)Also, another point...one mistake I've made in the past is making a recoverable process too recoverable.
Nice point.

I was pondering something related to that, but from the perspective of setting up a stream of processing (like UNIX piped processes) to avoid any unnecessary disk IO. You end up asking the same question; I just needed an overly complicated reason

I don't feel I have a good model about what needs to be saved after the processing run, and what is an intermediate result that could be scavenged and discarded early.

Anyway, I hope I have not irritated you by asking too many dumb questions Blackhawk, that isn't my intent.
Statistics
112018 users are registered to the AMD Processors forum.
There are currently 0 users logged in.

FuseTalk Hosting Executive Plan v3.2 - © 1999-2014 FuseTalk Inc. All rights reserved.



Contact AMD Terms and Conditions ©2007 Advanced Micro Devices, Inc. Privacy Trademark information