PxPlus User Forum

Main Board => Discussions => Programming => Topic started by: Loren Doornek on May 07, 2021, 01:58:00 PM

Title: Extract failing?
Post by: Loren Doornek on May 07, 2021, 01:58:00 PM
Recently, we've had a couple instances where the EXTRACT on a channel *seems* to be failing to prevent other processes from accessing the record, and I'm trying to figure out if that is really the case, or why.  I'm not certain that is the problem, but it's really the only logical explanation I can find.

We have an order update process that EXTRACTs the order record and writes the record to a different file, but never touches the extracted channel again until it REMOVEs the record when done.  The process hasn't changed for years, and has never been a problem.  Recently, a couple of the records were duplicated, which (seemingly) could only happen if two processes had extracted the record at the same time.

The two processes that run are both using the same FID of "IO", but I don't think that should matter.   There servers are Linux (RedHat) with multiple CPU's, and are fairly new and fast servers, so the processing time is milliseconds when this occurs.  I'm wondering if there is some write caching with the OS or hardware that is affecting this, but don't know how to check that.

In searching these forums and the documentation, I've seen suggestions of using "TIM=0" on the extract, and see that there is a FLUSH command to flush the write buffer to disk.  I don't know if either of these would be useful, but would appreciate any feedback or suggestions!
Title: Re: Extract failing?
Post by: Mike King on May 07, 2021, 02:26:33 PM
The only time we have seen EXTRACT's fail is if using a shared/mounted drive that implements opportunistic locking or doesn't provide locking at all.

Have you been able to pin the problems you are having to a specific OS?

As for the FID value -- it is not used for locking.  On Linux we use the OS's internal fcntl to lock sections of a file and on Windows we use the OS LockFile or LockFileEx (depending on context).
Title: Re: Extract failing?
Post by: Loren Doornek on May 07, 2021, 02:37:29 PM
Both servers that had this issue are running CentOS 7.  Neither is using any shared/mounted drives for the filesystems we are working with.  My thinking was that there is some caching at the OS or hardware (RAID controller or physical disk) level, but that may not be relevant if PXP is locking the record at the OS level.  I'm at a loss to figure out what's going on here since I've never seen an EXTRACT fail in all my years working on PVX/PXP.
Title: Re: Extract failing?
Post by: Mike King on May 07, 2021, 03:53:31 PM
Make sure you don't try to access the same file on multiple channels.  Record locking on Linux is by process, inode, and address as opposed to by channel thus if you extract on channel 1 then issue a read of the same record on channel 2 the extract will get freed.

You can somewhat resolve this by turning on the SB however that will ONLY prevent EXTRACTing from BOTH channels.

We are looking at switching to using a different OS file locking mechanism which is available on some Linux platforms in the future however it is not supported on all platforms.
Title: Re: Extract failing?
Post by: Mike King on May 07, 2021, 04:30:34 PM
One thing you can do if you are accessing the file on multiple channels is to set BOTH 'XI' and 'SB' system parameters.  This will force the system to internally check your own process for an EXTRACT (due to SB parameter) and allow you to read through it (due to XI parameter).  Other processes, should not be impacted -- and in fact I generally recommend setting both these parameters otherwise processes that are doing a straight read through your files have to watch out for extracted records causing them record locked errors.
Title: Re: Extract failing?
Post by: Loren Doornek on May 07, 2021, 05:53:10 PM
Thanks for the tips, Mike.  Based on your explanation of how the Linux locks work, I did find some old logic further down the stack in called programs which read the record (and thus clear the extract).  They are obviously a problem, but that logic isn't executed in these two instances.  Still, it shows that there might be a few other bugs hiding away that i haven't found yet!   I'll keep digging, and look into using the SB and XI params. 
Title: Re: Extract failing?
Post by: Mike King on May 08, 2021, 03:11:26 PM
I would set both 'XI' and 'SB'  (assuming it doesn't bother your logic) just to be on the safe side rather than trying to find somewhere where you might be accessing the same file twice.  These will help get around the issue with the way Linux handles record locks.