A Few Thoughts on Core Data

A few thoughts on core data. If the core data is your own creation, you must be clear about where it came from and how you created it. If you are using someone else’s data, you must be clear about how and where you got it. The spreadsheet should clearly state the location of the data so that if someone is using your work, they can go back to that data source and get the same set of numbers you got. If they can’t do this, you have a credibility problem. You don’t want that.

Core data should (and usually does) exist as a data table. This is all about data tables, sometimes called tabular data. A data table is a set of rows and columns that are formally structured. Oh, and another thing. A data table is a set of instance of a thing. And that thing can be almost anything. But it has to be something. You should be able to clearly say “This data table is a listing of X”, and know what X is. This is surprisingly rare, and the reason aside from lack of hard thinking, is that a data table is frequently a messy combination of two different things, because the author did not know how to link together 2 well organized data tables into one report or analysis.

Correcting Bad Data with an Audit Trail Technique

An Excel problem

Someone sends you a file with company and county. The county is obviously wrong on some of them. What do you do? You have a column called “Original County” with their data. This is never changed. Then you have a column called “Corrected County”. Then you copy the data in the “Original” column into the “corrected” column. Then you edit the “Corrected” column. In step form:

  1. Create a column called “Original Data”.
  2. Consider locking these cells and protecting the sheet.
  3. Copy the column called “Original Data” to a new column, and call it “Corrected Data”.
  4. Do all your editing, changing, and correcting in the column labeled “Corrected Data”. Never change anything in the column labeled “Original Data”.

Conceptually, what we are doing here is creating an informal audit trail of our work. If we or another person comes back to this work in a week or a month or a year, we can quickly figure out what was done.