Feedback

Email:

Content:

Home  /  Database  /  SBS Data
How to Buy 5zip data Products

Delivery of the data: Data delivery quality refers not only to how the data was delivered physically but also to the turnaround time and customer service involved. Five million records or less should have a 24-hour turnaround time. For 100 million records you should expect a two to four day turnaround. It is also important to measure qualities such as customer service, responsiveness to questions, and how the company handles problem resolution.

Designing An Accurate Data Test

Purchasing information, especially large amounts of information, should be treated much like any other large purchasing decision. Competitive bids should be reviewed with respect to what they offer and their associated costs. All data providers should offer you the ability to send a test file through their system before you actually purchase the information. This is your "test drive" of the data and you want it to be as accurate and informative as possible.

To do so it is extremely important that a proper test is created. The following rules should be applied in order to create a fair, unbiased and statistically accurate sample of test records. Ignoring these guidelines will almost guarantee incorrect test results.

Rule One:
Randomly choose all records to be tested. Do not pick the first 1,000 records in your file or every 10th one. Use a random number generator or table instead to ensure a truly random sample. This may sound tedious but in reality it can be easily programmed so that little effort is involved on your part.

Rule Two:
Ensure that each record has an equal chance of being selected the same as that of any other record within the general population. This means you cannot create a test file where most of the people live in New Jersey if you are measuring nationwide data. Likewise, if you are measuring data for New Jersey alone your test file must represent the whole state, not just a few cities. This sounds obvious but is often one of the failings of most tests.

Rule Three:
Choose an adequately sized test file to allow for mistakes in the random choice of records and to ensure that the file will not be a problem for each provider to test. This test should be provided for free or at a small cost. Listed below is a handy "rule-of-thumb" table for test file sizes.

Number of Records In Database Adequate Test File Size
0 - 100,000
100,001 - 250,000
250,001 - 1 Million
1 Million + 2,000
10,000
50,000
100,000

Rule Four:
Do not run an accuracy test for each element with an exact match as your criteria unless this is absolutely necessary. The level of acceptable error should vary by element and use. Instead, choose a range of accuracy that is within acceptable parameters.

No data is going to be 100% accurate, so design your tests to allow for some range of error. What is important is that you measure the data the way it is intended to be used. For example, let's take the element "Individual Age" and look below at some test numbers for a 1,000 record test file:

Provider Number of Direct Hits Number of Hits +/- 2 Years
One
Two
Three
500
300
250 575
650
800

Note that if you only counted exact matches in your tests you would probably choose Provider One. If you were comfortable with being off by plus or minus two years then Provider Three would provide the best data. If a range of plus or minus two years is acceptable then you would be making the wrong purchase decision by choosing Provider One.

Choosing the proper testing ranges may be the most significant parameter you include in your tests. Pick your acceptable ranges carefully.

Testing The Test File

If you have followed the steps above, you are now ready to test the provider's data. Often it is wise to let an outside company run the test files and do the accuracy checking and analysis with your help, especially if you do not have in-house analysts to do the mathematics.