可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I've got a web application that re-sizes images. The re-sized images are written to disk in order to cache them. What is the best way to prevent multiple, simultaneous requests from generating the same image?

A couple things to note, we have millions of images (measured in terabytes). Cached images that haven't been viewed in a while are removed. We have a web farm, but each web server has it's own local cache (the originals are stored on another server). We also place the re-sized images in a second-tier cache once they are generated so other web servers can check there to see if the image is cached, if it is, it is copied local.

I've considered using locks (I posted a class that I'm considering using here). But that obviously won't work with the 2nd-tier cache and I'm not sure if it is a good idea in general on a web server to use locks (though I'm not sure why, just a bunch of vague references to it being a bad idea).

I've also considered writing a temp file that I could check before I start creating the image, but I'm concerned that Windows won't clean up the file properly 100% of the time (locking issues, etc).

Any ideas are appreciated.

回答1:

Did you consider using middleware for that, such as MSMQ or ActiveMQ? Once the image resize request to web server is submitted, it goes to the queue. A separate application would check the queue, resize the image and save it to cache.

回答2:

I would avoid locks if you can - especially since you don't need to lock here. You also want to avoid one machine locking based on another machines processing. If two machines create the same resized image, I assume they would be the same. So, if two machines happen to resize the same issue because they both missed the cache then its only slightly less efficient (wasted time) but very likely better than locking (and possibly deadlocking) and trying to optimize the edge case.

One option would be to create the resized image locally and enqueue the cached item into a central queue (database? in memory on central service?) either with the data or with a reference how to pull it from the front end machine. The centralized cache queue is processed serially. If two duplicates get put in the queue between the time it's resized by more than one machine and the queue item can get processed, it doesn't matter since processing the duplicate would simply condition pulling it since it's already on disk.

回答3:

Firstly, generate the filename with a GUID so that you know you aren't going to have duplicate filenames.

Guid.NewGuid()

Then prevent locking on the images by using the code below :-

    public static Image GetImageWithoutLocking(string workingPathFileName)
    {
        Image returnImage = null;
        try
        {
            using (FileStream fileStream = new FileStream(Path.Combine(LivePaths.WorkingFolder, workingPathFileName), FileMode.Open, FileAccess.Read))
            {
                byte[] img;
                img = new byte[fileStream.Length];
                fileStream.Read(img, 0, img.Length);
                fileStream.Close();
                returnImage = Image.FromStream(new MemoryStream(img));
                img = null;
            }
        }
        catch 
        {
            throw;
        }
        return returnImage;
    }

I have this code running very effectively and it was the only way I could find to be sure that the file is never locked.

回答4:

Using a database to list file hashes would be the quickest way to do it. Then this can be shared between all tiers it also allows you to offload any locking in to the Transactional SQL (T-SQL).

Other large scale applications that have to store TB like Symantec Enterprise Vault do the same thing.

回答5:

It should be no different from web apps that need to control the editing/updating of data in a database.

As far as I have tried, successfully, was storing the image as a blob field in the database. I had the blob editing controlled just as any other data field would.

Which means you have to be familiar with how web services work with the database to deal with collisions and concurrency control.

As an alternative If you cannot afford a highly scalable rdbms ... Instead of storing as blob in the database, you could store the file name/path, where the actual image is stored in the file system. The database provides the unique key to an image. All accesses to any image has to be done thro its database record. Every time a new image is generated, the following takes place under an atomic transaction in the order prescribed

it is stored under a new name/path
if success, the database record is updated
if success, the old image is deleted

This is the contingencies you have to treat: if the last step is not successful (system/power failure may be), the db record would be rolled back and you would have an orphan image. Or if the db update fails, the newly stored image would end up as an orphan.

Therefore, to keep your file system sane and clear away orphans, you will probably delete images that are older than 24 hours.

For a more robust solution, refer to a description of my web app caching technique:

http://h2g2java.blessedgeek.com/2010/04/page-caching-using-request-parametric.html

回答6:

I would suggest 2 solutions which are similar in the nature. one of them is to use a WCF service layer. Within this service, you can use a concurrent dictionary. You should develop a hash code in such a way that, same image would create the same hash. Therefore you will have a single instance of the image in your concurrent dictionary. You can as well add time stamp to your class which will represent the image. It might have a use. Once you generate the image you can update this class in your class with the location of generated image. and you can have a big flag which will indicate that this image is being processed if you have another request comes in asking for a resize. then you ignore that request. Not only that you are using a concurrent dictionary, you can also lock single key within the dictionary again. but if you use a bit flag as CurrentlyProcessing, you wont need a lock. This would be a very fast and efficient solution, IMO.

Another solution would be a distibuted hash table such as appfabric cache. Same logic as above.

what do u think?

回答7:

I am not sure if you really need to solve this point - consider the following points:

what if one server starts resizing a specific and the resiszing process gets somehow "stuck" ? IF you implement what you describe then all other servers would wait for that server to finish... not sure this makes for a good user experience
OTOH if you don't implement that you only loose a bit of time but are not confronted with solved the above issue...

I would definitely implement some sort of either DB- or (central) in-memory-cache of the contents (image IDs) of the 2nd-tier-cache to avoud machines from getting into conflicts when copying the resized image into the cache...

回答8:

If you want to make a client be able to process one random image at a time first you store a flag in the viewstate when you the request is sumbitted. The flag is raised when the data is submitted and flag is resetted when you finish processing the image. When you get a request just check if the flag is raised or not. if raised reject to process the image.

In second case, namely, if you want to pretend user to submit the pretty same image you can store the name and size(bytewise) of the image in viewstate and when user selects an image you compare the name and size of the image before you process the image. if the size and name of the image are the same that you stored in viewstate you reject to process the image. otherwise you process it.

Hope it can help you.