Podchaser Logo
Home
399: Data Munging

399: Data Munging

Released Wednesday, 1st February 2023
Good episode? Give it some love!
399: Data Munging

399: Data Munging

399: Data Munging

399: Data Munging

Wednesday, 1st February 2023
Good episode? Give it some love!
Rate Episode

There was a small problem in our database. Some JSON data we kept in a column would sometimes have a string instead of an integer. Like {"tabSize": "5"} instead of {"tabSize": 5} of the like. Investigation on how that happened was just silly stuff like not calling parseInt on a value as it came off a <select> element in the DOM. This problem never surfaced because our Rails app just papered over it. But we're moving our code to Go in when you parse JSON in Go, the struct type that you parse it out into needs to match those types perfectly, or else it panics. We had found that our Go code was working around this in all sorts of ways that felt sloppy and inconsistent.

One way to fix this? Fix any bad data going into the DB, then write a script to fix all the data in the DB. This is exactly the approach I took at first, and it would have absolutely fixed this problem.

But Alex took a step back and looked at the problem a bit wider, and we ended up building some tools that helped us solve this problem, and solve future problems related to this. For one, we built a more permission JSON parser that would not panic on something as easy to fix as a string-as-int problem. This worked by way of some Go reflection that could tell what types the data was supposed to be and coerce them if possible. But what should the value fall back to if it's not savable? That was another tool we built to set the default values of Go structs to be potentially other values than what the defaults for their types are. And since this is all in the realm of data validation, we built another tool to validate the data in Go structs against constraints, so we can always keep the data they contain good.

Once all these tools were in place, the new script to fix the data was much easier to write. Just call the safe JSON function to fix the data and put it back. And the result is a cleaned up code base and tools we can use for data safety for the long term.

Time Jumps

Sponsor: Split

This podcast is powered by Split. The Feature Management & Experimentation Platform that reimagines software delivery. By attaching insightful data to feature flags, Split frees you to quickly deploy, measure, and learn the impact of every feature you release. So you can safely deliver features up to 50 times faster and exhale. What a Release.

Start raising feature flags (and lowering stress). Visit Split.io/CodePen for a free trial.

Show More
Rate

Join Podchaser to...

  • Rate podcasts and episodes
  • Follow podcasts and creators
  • Create podcast and episode lists
  • & much more

Episode Tags

Do you host or manage this podcast?
Claim and edit this page to your liking.
,

Unlock more with Podchaser Pro

  • Audience Insights
  • Contact Information
  • Demographics
  • Charts
  • Sponsor History
  • and More!
Pro Features