Home / How to delete a file you get from urllib.urlretrieve()

I just hit an interesting corner case in Python. Urllib module has a very useful function urllib.urlretrieve(url, filePath) which will get a given url and save it to a file. This function, however, might fail, in which case we would like to delete it so that we don’t get confused by partially downloaded, corrupted file.

To detect failure of urlretrieve() we can simply wrap it inside exception handling. As experience showed, at least on windows, calling os.remove() on the file immediately after the exception happened will fail with ‘Access denied’. According to docs, this error is returned if file is still being used. That is plausible (although not something I would expect).

My solution to the problem was to retry os.remove() few times, sleeping for a second between tries, therefore giving python time to close the file used in urlretrieve(). This seems to work. And here’s the full code:

def fFileExists(filePath):
  try:
    st = os.stat(filePath)
  except OSError:
    # TODO: should check that Errno is 2
    return False
  return True

failedDownload = False
try:
  urllib.urlretrieve(url, filePath)
  break
except:
  print "exception: n  %s, n  %s, n  %s n  when downloading %s" %
(sys.exc_info()[0], sys.exc_info()[1], sys.exc_info()[2], url)
  failedDownload = True
  # remove potentially only partially downloaded file

if failedDownload and fFileExists(filePath):
  removeRetryCount = 0
while removeRetryCount < 3:
  time.sleep(1) # try to sleep to make the time for the file not be used anymore
  try:
    os.remove(filePath)
    break
  except:
    print "exception: n  %s, n  %s, n  %s n  when trying to remove
file %s" % (sys.exc_info()[0], sys.exc_info()[1], sys.exc_info()[2], filePath)
    removeRetryCount += 1
Go to index of articles.