2

I have about 30 different worksheets in a workbook that all have well over 30k entries in each of them. In the second column of each row, there is a path to a file.

This file name could be anything. I, on the other hand, have a list of about 450 different file names that I am checking against.

If the file path does not contain any of those 450 file names, then I want the row deleted.

I am trying to do this as quickly and efficiently as possible. I am not very experienced in VBA and Excel.

8
  • FYI this will take a super long time since you'll be checking (30000 * 30) * 450 Commented Jul 30, 2013 at 19:02
  • I understand this. I am trying to see if there are more efficient ways of doing this. To my knowledge, that is the only way. Commented Jul 30, 2013 at 19:03
  • If the filenames are stored in a hashtable, then the time drops to (30000 * 30)+450. Commented Jul 30, 2013 at 19:04
  • 1
    This could also be done with a bunch of VLOOKUPs fairly quickly (provided you set the last parameter to true), and without writing any VBA. I imagine it would be about 5 times slower than using a hashtable/dictionary. Commented Jul 30, 2013 at 19:19
  • 2
    Yep. If you set it to "approximate match", it uses a binary search, and returns a wrong answer if it can't find a right one. If you set it to "exact match", it checks every value individually. Be sure to sort the filenames beforehand, and check whether the filename returned by the VLOOKUP actually matches the filename you're testing. Commented Jul 30, 2013 at 19:22

1 Answer 1

3

One simple way to do this, without writing any VBA, is to use a VLOOKUP:

=VLOOKUP(A1, <absolute_reference_to_file_list>, 1, TRUE)=A1

Then use the filtering tools to delete all rows with FALSE in the new column. A small number of columns might display "#N/A"; this should be treated the same as FALSE.

Note that this method requires the list of filenames to be sorted in advance, since Excel will be performing a binary search on the list of filenames.

This isn't the fastest possible method, but it's fairly fast, and may well be sufficient for your needs.

Sign up to request clarification or add additional context in comments.

5 Comments

Please excuse my unfamiliarity with excel. But how do I refer to <absolute_reference_to_file_list>? I have the file list in another spreadsheet.
Excel will do most of the work for you if you put the cursor in the right place in the formula, then tab over to the file list (which you should also have open) and select the list of files. Actually, I think if you click the top of the column with the file list in it (i.e. the letter "A"), Excel will do exactly what you need. Then tab back to where you were editing the formula and press Enter.
Afterward, the absolute reference should look something like '[File List.xlsx]Sheet1'!$A$2:$A$451 or '[File List.xlsx]Sheet1'!A:A. The dollar signs are what make the reference absolute. You probably don't need them if the reference is in the second form, though it can't hurt to add them.
Sorry to take back my comment. When running this, I am getting only getting TRUE or FALSE in the first row of the new column. Since its really no big deal, I just copied and pasted my fileNames into a seperate row inside of the spreadsheet I am trying to count.
So you're saying it worked after pasting the filenames into the same sheet? If the other cells in the new column are blank, you may need to fill down (double-click the lower right-hand corner of the cell while it's selected).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.