View Full Version : 2 databases- purge duplicates- How ?
Parangles
10-16-2003, 08:41 AM
I have 2 mailing list/address label databases that are in .wdb- MS Works- and I know there are a lot of name/addresses that are on both of the lists. How can I identify these ?
Thanks.
sdkfz
10-16-2003, 09:19 AM
I am not familiar with this database, but I can give you general database hints and ideas
Assuming that there is a good identifier for the records to check against (I know you do not have these but Social Security Number is perfect), you copy one table with the formatting only, then you make the field that guarantees a unique address/person a "unique key" or "Primary key". For your purposes the e mail address could work. Now you have three tables. One blank that will only allow one record on the key field and the two populated tables.
You append the first table into the blank, there should be few if any issues unless you have several people listed with the same e mail address, then you copy in the second table, here you will have the duplicates rejected from the append becasue they are already in the table. Of course on a table like this if you have a typo on one address it would still append so "Jim" is still in there twice once with a good address and once with the bad one.
Another idea would to be make a single field called Name where you combine all the names fields, I am sure you know two "Jim's" or two "Smith's" but probably not two "Jim B Smith's" this combined field could be used as the primary key instead of the e mail address. You will have to make the best call on that.
Good Luck
doctorgonzo
10-16-2003, 09:49 AM
There's no easy way to purge duplicates from a database. If you have a unique ID like a SSN, then it usually isn't a problem, but with a mailing list that is less likely. You can try to eliminate duplicate addresses, but as mentioned if there is a typo (or simply if the address was entered different, like 123 N 1st St and 123 1st St N) then the duplicate would remain. You can also go by phone number, e-mail address, anything that you can use to match people up.
I have not used the database program that comes with MS Works in about a decade, so I don't know how powerful it is when it comes to handling relational databases. I do most of my work in Access and SQL Server, and unfortunately, getting rid of duplicates a lot of times comes down to doing it manually.
Parangles
10-16-2003, 10:01 AM
You lost me ! Ashamed to say I didn't follow that at all.
I just started with databases last week and received one of them from the vendor as, I think, csv format which I entered into my ancient MSWorks 4.5. The other list I am entering by hand . Both are only about 1000 names each. The programs Tools/Options available are limited as far as I can see, to
' sorting' by field- ascending , descending , and
" filter " by field. The lists only have name, address, city, zip.
I know there are lots of duplicates- one list is residents in an affluent neighbourhood, and the other is of parents of kids at a private school nearby.
Maybe I need to buy a new database program ? Any suggestions ?
Thanks !
PS- Thanks DoctorG- didn't see yours which was posted when I was typing this. My best idea now is to load the lists each one on a different PC in the same room ( have that ) and two people scrolling with each list sorted alphabetically, one calling out names and the other person deleting duplicates from their list. Probably take ? an hour ?
Thanks
doctorgonzo
10-16-2003, 10:25 AM
That may be the best option in your situation. I don't think Works really allows for complex database manipulation. Tedious, but for a one-time deal it is better than getting Access and having to learn the complexities of that.
Gonz - Works is a flat file database, not relational at all.
sdkfz
10-16-2003, 11:17 AM
Parangles,
edit . I just saw GLC's comment - there is now way to use my first info on that program because there is no relational capability
end edit
Sorry bout losing you on that, If this is a one of a kind sort of thing then Dr G is right, no reason to spend the money on the DB program to use it once, especially since it is not only the purchase price but the learning curve on it too.
If both sets of data can be opened in Excel, then you could
1) make each the same column wise (ie name then address then state etc)
2) Open a new spreadsheet and copy both into the new one (always a good idea to keep the original data safe just in case...)
3) Sort in the new single sheet and then eliminate the duplicates, this way you do not use the resources of two people and at a glance you can tell if there was a address that is the same just typed diferently like the Doctor mentioned.
DB's are fun, a simple way to think of them is they are made up of worksheets in excel. Each worksheet is a table in the DB. Each table should have some sort of unique field to 'relate' back and forth, SSN, customer number, etc. Each table handles only one type of data ... for example ...in one table you have the customer info (names address etc) another has orders made by the customer, another has complaints etc
This way you enter the customer info once and can have many orders that 'relate' to the one customer. The key is to have a good identifier up front.
Parangles
10-16-2003, 04:15 PM
Thanks to all ! sdkfz that worked great - I selected all, copy and pasted into new spreadsheet, from each of the two, sorted new combined one alphabetically and the dupes are visible for deletion.
Cheers.
vBulletin® v3.7.0, Copyright ©2000-2008, Jelsoft Enterprises Ltd.