My job involves a fair amount of fuzzy matching, when working on the analysis for a project I’ll often use excel as it does everything I need but its also great as you can send spreadsheets to the business knowing they’ll be comfortable using it and fairly self sufficient.
This fuzzy matching Add-In from Microsoft Research really caught my eye, as it meant I could take sample records e.g.: Name1 and Name2 and run a rough estimation for how accurate the matching should be.
Firstly you need to have Excel 2010 installed, the add-in can be downloaded from here, unzip and click the usual next buttons.
Here is a simple example to get you started:
- Create a new spreadsheet and paste the following in A1
|New Customer ||Existing Customer |
|Jimmy Paige ||Jimmy Page |
|Ladee Gaga ||Lady Gaga |
|Jimmi Hendriks ||Jimmy Hendricks |
- Fuzzy lookup needs tables to compare values so…
- Highlight all the rows(not entire column) in column A and hit Control + L this creates a table within excel for that column, ensure you tick the “My table has headers”
- Highlight the rows(not entire column) in column B and hit Control + L and tick the headers column, you now have the 2 tables which you are ready to compare.
- Your screen should look like this
- Now click the fuzzy lookup link in the toolbar and then click the fuzzy logic button, a menu should appear on the right
- At the top the 2 tables which are to be compared are chosen, left column and right column are pre-selected, in the match columns the columns to be compared have already been selected, if not click the compare button which will add them to the match columns list.
- You can then select the type of comparison, the default is text, other options are Zip code, Phone Number or Social Security, you can define your own by clicking configure (I’ve yet to do this)
- Hit ‘Go’ and you should see the following:
- As you can see we have 2 matches with similarity ratings,
I’ve found the add-in to be pretty unstable ranging from running forever to not running at all, to run a new comparison I usually restart excel!
Don't let this put you off!