''' Mean Filter, semi-optimized Created by: Nathan Jennings Updated on: November 15, 2007 Mean Filter using Numeric/Numpy arrays, Gdal built for Windows, Python 2.4 This filter uses a kernel structure to process pixel elements stored in a Numeric array. Input - TIF file Output - TIF file as described in Gdal kernel size - 3x3 (3), 5x5 (5), etc NOTES: This function will only work on Windows OS and uses the Python 2.4 gdal build for Python 2.4 as found on www.gdal.org, Frank Warmerdam. Numeric Python (Numeric) is an older version of Numpy (or SciPy) that can be used with this version of gdal 1.4.2. Performance assistance from Chris Barker, NOAA. I appreciate Chris' input and knowledge of numpy and array processing. Originally, I developed this function using a 6 x 400 x 600 Landsat TM subset. It took ~ 5 minutes, which I think is long. I am not using a look up table or any other optimization, but I have read that Python processes slower than C++ or other lower level programming language. This same structure can be used to put in any other kind of kernel based processing such as median, guassian, or custom kernel. Most of the code will likely go inside the inner most "for loops." Other intermediate calculations may be needed and will need to be placed accordingly. Update - November 15, 2007 Thanks to Chris' assistance, I have succesfully processed the same image in a fraction of the time (10 seconds v. 5 minutes), however, when I processed a 6x3000x4000 image, it took about 2 hours. I noticed that my RAM was not in use, but my page file was and my system actually made it bigger. I will continue to look into this as well. I did have some intermediate arrays, that I did eliminate, which may help some. I will look into optimization routines to incorporate or if others have suggestions you can contact me at nate@deltateck.com. I would like to hear about other kinds of work being done in in C++, provided gdal will work with it. I would be using the free C++ compiler (g++ or similar) and not writing the actual bindings to build gdal. I am aware that writing Python will probably require less code and there are some Python type performance enhancers I will be looking into such as in-line, blitz, and weave (from Numpy) as well as PyRex. ''' import gdal import Numeric as N #This defines a generic array to use later on import numpy #numpy is needed for Numeric to numpy array conversion import os, sys import time #Open Multi-band image" print "Starting Mean Filter..." img = gdal.Open(r'c:\\nate\\sactmclp_sub.tif') #img = gdal.Open(r'g:\\remote_sensing_images\\Landsat\\TM\\sacclip2.tif') #Load Image into a Numeric (Python) Array #The line below is necessary because I use it below. #You don't have to use an array for this, but you can use various #gdal objects to get the bands, rows, and columns of an image. See below. #dal requires processing by band....see below. a = img.ReadAsArray().astype(N.Float16) nbnds = a.shape[0] #a.astype(N.Float) print a.shape #print a #Assign an output file. I have it hard coded here. outfile = "c:\\nate\\sactm_mean_img.tif" #outfile = "g:\\remote_sensing_images\\Landsat\\TM\\small_mean_img2.tif" #outfile = "g:\\remote_sensing_images\\Landsat\\TM\\verylgtest_mean_img2.tif" #Create output file parameters #The value here is one of many values for different formats of imagery #check out the website above for more details. format = "GTiff" out_driver = gdal.GetDriverByName( format ) #Create Copy is a gdal function that basically makes a copy of the input image #This will also keep any georeferencing with the image. #If Create is used, then you will loose the georeference and have to add #it in either programmatically or through other software. outputimg = out_driver.CreateCopy(outfile, img, 0) #The following shows how to report out the bands, rows columns #Once it is in an array. #You can also report out below using gdal objects such as #RasterCount, RasterX, and RasterY (i.e. img.RasterCount) #The following is not necessary to run the function, but for illustrative #purposes and to show some of my own investigation. #print a.shape #Prints the dimensions of the image array #print "Num Bands: ", a.shape[0] #print "Num Rows: ", a.shape[1] #print "Num Cols: ", a.shape[2] #print a #Prints the array out to screen #Filter (kernel) size - 3x3 - filter = 3 filter = 3 #Compute the filter Window_Total = filter*filter offset = (filter-1)/2 #filter definition (in this case it will do a mean) f = N.ones((filter, filter), N.Float16)/Window_Total m = f.shape[0] n = f.shape[1] #print m #print n #print f #This sets up the proper row and column to start the processing #Since the method operates by using a kernel, the index for the #rows and columns need to be offset, which is determined by the kernel size row = offset #For a 3x3 kernel, the offset will be 1; 5x5 will be 2, etc. col = offset bnd = 0 #This initializes the band for loop #Just a counter. I wanted to see how many calculations were being done, approximately CalcCount = 0 #The following loops over the bands, row,s and columns, calculates the mean #and assigns them to the respective output # bnd initialized at 0 - nbands (in this case 6), so goes from 0-5 (i.e. a total of 6 bands) for bnd in range(nbnds): # print "Working on Band: ", bnd #Gdal operates on a given band...this gets the band #Then stores it in a Numeric array, so that it can be processed #gets the band of the output #and allocates an array for it. The array is where the results of the #mean will be stored - GetRasterBand starts with 1 in_band = img.GetRasterBand(bnd+1) '''You could do this temp thing, but it is not needed and takes up more space''' #in_temp = in_band.ReadAsArray().astype(N.Float16) #band_array = numpy.asarray(in_temp) '''I am converting the Numeric array to a Numpy array so that the algorithm in the for loops below actually work; this is a current limitation of my version of gdal and the fact it uses Numeric instead of the newer Numpy I am also casting the Integer values in the array to Float, since the resultant calculation is a decimal value and I don't want truncation to occur to the values until my output''' band_array = numpy.asanyarray(in_band.ReadAsArray().astype(N.Float16)) '''This could be set from above outside the band loop''' nrows = band_array.shape[0] ncols = band_array.shape[1] '''Just printing''' print "Input Dimensions ", band_array.shape print "Input Rows: ", band_array.shape[0] print "Input Cols: ", band_array.shape[1] #print band_array '''Same as above for output. The output needs to iterate inside the band loop to provide the correct output band for the gdal WriteArray''' out_band = outputimg.GetRasterBand(bnd+1) #out_array = out_band.ReadAsArray() #temp_array = out_band.ReadAsArray().astype(N.Float16) #output_array = numpy.asarray(temp_array) '''Thanks to Chris Barker from NOAA for his help with the array structure here as well ast the Numeric to Numpy conversion above; I would never have figured that out''' '''I was only able to get the algorithm to work by using Chris' suggestion of computing the center part of the input image. I am still looking into why''' output_array = numpy.asarray(out_band.ReadAsArray().astype(N.Float16)) center = numpy.asarray(output_array[offset:nrows-offset, offset:ncols-offset]) center[:] = 0 #just printing print "Output Dimensions ", output_array.shape print "Output Rows: ", band_array.shape[0] print "Output Cols: ", band_array.shape[1] #print output_array #print centerbnd.shape #print centerbnd #After reading in a band to an array, loop over rows and columns #by default, i and j are 0 for i in range(m): for j in range(n): '''All of the comments here are from my previous, slow attempt. There is really just one line, the algorithm to be calculated. After all of the iterations, the resultant band holds the mean. NOTE: the indentation is important and matches to the "j" loop''' #Work on the kernel of pixels within a given band #This looping gets to a target pixel to do calculations #winrow = row - offset #Set the initial row for the kernel #WinSum2 = 0 #CalcMean = 0 #Initialize the CaclMean variable #for winrow in range(winrow, row+offset+1): # wincol = col - offset #Make sure to start the next row at column 0 # for wincol in range(wincol, col+offset+1): #Sum the pixels in the kernel #The array stores a specific value in the array (i.e. the pixel value) #WinSum += band_array[winrow][wincol] #print i, " ", j #WinSum2 += a[i:nrows-m+i+1, j:ncols-n+j+1]* f[i,j] #WinSum += WinSum #centerbnd+= a[i:nrows-m+i+1, j:ncols-n+j+1]* f[i,j] center+=band_array[i:nrows-m+i+1, j:ncols-n+j+1]* f[i,j] #print center #print "Done printing Mytet" #print WinSum2 #print "Finished" #print "Array R,C: ", a[i:nrows-m+i+1,j:ncols-n+j+1] * f[i,j] #center += a[i:nrows-m+i+1,j:ncols-n+j+1] * f[i,j] #print a[bnd, i:nrows-m+i+1, j:ncols-n+j+1]* f[i,j] #Notice the CalcMean is indented to calculate within the "col for loop" #not within the "wincol loop" #CalcMean = WinSum / Window_Total #Assign mean within kernel to output array #This is offset by 1 row and 1 column as a result of the kernel size being used #out_array[row][col] = CalcMean #increment the counter by band...not needed to run the program CalcCount +=1 #Write array of caclulated means to image file (ie. for a given band) #Notice this is indented to match the "band for loop" print "Writing File" '''The next 2 lines write the file to the image The first line take the "Numpy" array and converts it to a Numeric array, which is required by the second line. WriteArray is a gdal function and will only work with Numeric arrays for this version.''' outMytest = N.array(output_array) out_band.WriteArray(outMytest) print "Finished band: ", bnd+1 print "Created band: " + str(bnd+1) print "Total pixels Calculated: " + str(CalcCount) #This is not accurate, I need to put the counter #in the right place; just for testing #Clean up data files #This may be necessary to see changes in output file #if you need to redo del outputimg del img