Michael Outlaw is a software developer and co-host of Coding Blocks Podcast.
We learn what to look for in a code review while reviewing Google’s engineering practices documentation as Michael relates patterns to choo-choos, Joe has a “weird voice”, and Allen has a new favorite portion of the show. Are you reading this via your podcast player? You can find this episode’s full show notes at https://www.codingblocks.net/episode133 where you can also join the conversation. Sponsors University of California, Irvine Division of Continuing Education – One of the top 50 nationally ranked universities, UCI offers over 80 certificates and specialized programs designed for working professionals. Registration is NOW OPEN! Sign up and reserve your seat today!Datadog.com/codingblocks – Sign up today for a free 14 day trial and get a free Datadog t-shirt after your first dashboard. Survey Says Anonymous VoteSign in with WordpressHow likely are you to advocate for working from home in the future?After this pandemic? Every. Opportunity. I. Get.After this pandemic? Never.Work from home? Is that even an option?vote News Thank you to everyone that left us a review:iTunes: codewith_himanshu, SpaceDuckets, akirakinskiStitcher: Anonymous (from Croatia), Llanfairpwllgwyngyll (Wikipedia), Murali SuriarWatch Joe solve LeetCode Problems (YouTube) Regarding the OWNERs file … // TODO: Insert Clever Subtitle Here Design This is the MOST IMPORTANT part of the review: the overall design of the changelist (CL).Does the code make sense?Does it belong in the codebase or in a library?Does it meld well with the rest of the system?Is it the right time to add it to the code base? Functionality Does the CL do what it’s supposed to do?Even if it does what it’s supposed to do, is it a good change for the users, both developers and actual end-users?As a reviewer, you should be thinking about all the edge-cases, concurrency issues, and generally just trying to see if any bugs arise just looking at the code.As a reviewer, you can verify the CL if you’d like, or have the developer walk you through the changes (the actual implemented changes rather than just slogging through code).Google specifically calls out parallel programming types of issues that are hard to reason about (even when debugging) especially when it comes to deadlocks and similar types of situations. Complexity This should be checked at every level of the change:Single lines of code,Functions, andClassesToo complex is code that is not easy to understand just looking at the code. Code like this will potentially introduce bugs as developers need to change it in the future. A particular type of complexity is over-engineering, where developers have made the code more generic than it needs to be, or added functionality that isn’t presently needed by the system. Reviewers should be especially vigilant about over-engineering. Encourage developers to solve the problem they know needs to be solved now, not the problem that the developer speculates might need to be solved in the future. The future problem should be solved once it arrives and you can see its actual shape and requirements in the physical universe.Google’s Engineering Practices documentation Tests Usually tests should be added in the same CL as the change, unless the CL is for an emergency.Emergencies were discussed in episode 132.Make sure the tests are correct and useful.Will the tests fail if the code is broken?Are the assertions simple and useful?Are the tests separated appropriately into different test methods? Naming Were good names chosen?A good name is long enough to be useful and not too long to be hard to read, Comments Were the comments clear and understandable, in English?Were the comments necessary?They should explain WHY code exists and NOT what it’s doing.If the code isn’t clear enough on its own, it should be refactored.Exceptions to the rule can include regular expressions and complex algorithms.Comments are different than documentation of code. Code documentation expresses the purpose, usage and behavior of that code. Style Have a style guide. Google has one for most of the languages they use.Make sure the CL follows the style guide.If something isn’t in the style guide, and as the reviewer you want to comment on the CL to make a point about style, prefix your comment with “Nit”.DO NOT BLOCK PR’s based on personal style preference!Style changes should not be mixed in with “real” changes. Those should be a separate CL. Consistency Google indicates that if existing code conflicts with the style guide, the style guide wins.If the style guide is a recommendation rather than a hard requirement, it’s a judgement call on whether to follow the guide or existing code.If no style guide applies, the CL should remain consistent with existing code.Use TODO statements for cleaning up existing code if outside the scope of the CL. Documentation If the CL changes any significant portion of builds, interactions, tests, etc., then appropriate README’s, reference docs, etc. should be updated.If the CL deprecates portions of the documentation, that should also likely be removed. Every Line Look over every line of non-generated, human written code.You need to at least understand what the code is doing.If you’re having a hard time examining the code in a timely fashion, you may want to ask the developer to walk you through it.If you can’t understand it, it’s very likely future developers won’t either, so getting clarification is good for everyone.If you don’t feel qualified to be the only reviewer, make sure someone else reviews the CL who is qualified, especially when you’re dealing with sensitive subjects such as security, concurrency, accessibility, internationalization, etc. Context Sometimes you need to back up to get a bigger view of what’s changing, rather than just looking at the individual lines that changed.Seeing the whole file versus the few lines that were changed might reveal that 5 lines were added to a 200 line method which likely needs to be revisited.Is the CL improving the health of the system?Is the CL complicating the system?Is the CL making the system more tested or less tested?“Don’t accept CLs that degrade the code health of the system.”Most systems become complex through many small changes. Good Things If you see something good in a CL, let the author know.Many times we focus on mistakes as reviewers, but some positive reinforcement may actually be more valuable.Especially true when mentoring. Resources We Like OWNERS files (chormium.googlesource.com)Modern Code Review: A Case Study at Google (research.google)Google Engineering Practices Documentation (GitHub)What to look for in a code review (GitHub)Comparing Git Workflows (episode 90)Google Style Guides (GitHub)Perl Special Variables Quick Reference (PerlMonks)Email Address Regular Expression That 99.99% Works. (emailregex.com) Tip of the Week List of common misconceptions (Wikipedia)The unofficial extension that integrates Draw.io into VS Code. (marketplace.visualstudio.com)Use Dataproc’s Cluster properties to easily update XML settings. (cloud.google.com)Bonus tip: Include a Dockerfile (or Docker Compose) file with your open source project to help it gain traction.
We dig into Google’s engineering practices documentation as we learn how to code review while Michael, er, Fives is done with proper nouns, Allen can’t get his pull request approved, and Joe prefers to take the average of his code reviews. In case you’re reading this via your podcast player, this episode’s full show notes can be found at https://www.codingblocks.net/episode132. Be sure to check it out and join the conversation. Sponsors University of California, Irvine Division of Continuing Education – One of the top 50 nationally ranked universities, UCI offers over 80 certificates and specialized programs designed for working professionals. Registration is NOW OPEN! Sign up and reserve your seat today!Datadog.com/codingblocks – Sign up today for a free 14 day trial and get a free Datadog t-shirt after your first dashboard. Survey Says Anonymous VoteSign in with WordpressDo you *always* include (new or updated) unit tests with your pull requests?Yes, always, of course. I'm not a psychopath.Not with *every* pull request, but more often than not. I mean, what kind of psychopath has the time to do them for *every* pull request?No, I rarely include any tests with my pull requests. My friends think I'm crazy.vote News Thank you to everyone that left us a review:iTunes: Jbarger, Podcast Devourer, DuracceStitcher: Daemyon C How to Code Review Code Review Developer Guide Q: What is a code review?A: When someone other than the author of the code examines that code. Q: But why code review?A: To ensure high quality standards for code as well as helping ensure more maintainable code. What should code reviewers look for? Design: Is the code well-designed and appropriate for your system?Functionality: Does the code behave as the author likely intended? Is the way the code behaves good for its users?Complexity: Could the code be made simpler? Would another developer be able to easily understand and use this code when they come across it in the future?Tests: Does the code have correct and well-designed automated tests?Naming: Did the developer choose clear names for variables, classes, methods, etc.?Comments: Are the comments clear and useful?Style: Does the code follow our style guides?Documentation: Did the developer also update relevant documentation? Picking the Best Reviewers Get the best reviewer you can, someone who can review your code within the appropriate time frame.The best reviewer is the one who can give you the most thorough review.This might or might not be people in the OWNERS file.Different people might need to review different portions of your changes for the same pull request.If the “best” person isn’t available, they should still be CC’d on the change list. In Person Reviews If you pair-programmed with someone who was the right person for a code review, then the code is considered reviewed.You can also do code reviews where the reviewer asks questions and the coder only speaks when responding to the questions. How to do a Code Review The Standard of a Code Review The purpose of the code review is to make sure code quality is improving over time. There are trade-offs:Developers need to actually be able to complete some tasks.If reviewers are a pain to work with, for example they are overly critical, then folks will be less incentivized to make good improvements or ask for good reviews in the future.It is still the duty of the reviewer to make sure the code is good quality. You don’t want the health of the product or code base to degrade over time.The reviewer has ownership and responsibility over the code they’re reviewing. Reviewers should favor approving the changes when the code health is improved even if the changes aren’t perfect. There’s no such thing as perfect code, just better code. Reviewers can actually reject a set of changes even if it’s quality code if they feel it doesn’t belong in “their” system.Reviewers should not seek perfection but they should seek constant improvement.This doesn’t mean that reviewers must stay silent. They can point out things in a comment using a prefix such as “Nit”, indicating something that could be better but doesn’t block the overall change request. Code that worsens the overall quality or health of a system should not be admitted unless it’s under extreme/emergency circumstances. What constitutes an emergency? A small change that: Allows a major launch to continue,Fixes a significant production bug impacting users,Addresses a legal issue, orPatches a security hole. What does not constitute an emergency? You want the change in sooner rather than later.You’ve worked hard on the feature for a long time.The reviewers are away or in another timezone.Because it’s Friday and you want the code merged in before the weekend.A manager says that it has to be merged in today because of a soft deadline.Rolling back causes test failures or breaks the build. Mentoring Code reviews can absolutely be used as a tool for mentoring, for example teaching design patterns, explaining algorithms, etc., but if it’s not something that needs to be changed for the PR to be completed, note it as a “Nit” or “Note”. Principles Technical facts and data overrule opinions and/or preferences.The style guide is the authority. If it’s not in the style guide, it should be based on previous coding style already in the code, otherwise it’s personal preference.The reviewer may request the code follow existing patterns in the code base if there isn’t a style guide. Resolving Conflicts If there are conflicts between the coder and reviewer, they should first attempt to come to a consensus based on the information discussed here as well as what’s in the CL Author’s Guide or the Reviewer Guide.If the conflict remains, it’s probably worth having a face to face to discuss the issues and then make sure notes are taken to put on the code review for future reference and readers.If the conflict still remains, then it’s time to escalate to a team discussion, potentially having a team leader weigh in on the decision. NEVER let a change sit around just because the reviewer and coder can’t come to an agreement. Resources We Like Google Engineering Practices Documentation (GitHub)Code Review Developer Guide (GitHub)How to do a code review (GitHub)The Standard of Code Review (GitHub)Emergencies (GitHub)The CL author’s guide to getting through code review (GitHub)Technical Writing Courses (developers.google.com)Ruffles Potato Chips, Cheddar and Sour Cream (Amazon) Flawless Execution Tip of the Week William Lin’s competitive programming channel (YouTube)Register for the free Microsoft Build digital event, May 19-20. (register.build.microsoft.com)Apple to host virtual Worldwide Developers Conference beginning June 22 (Apple)Checkstyle helps Java developers adhere to a coding standard. (checkstyle.sourceforge.io)CheckStyle-IDEA – An IDEA plugin that uses Checkstyle but isn’t officially part of it. (plugins.jetbrains.com)Black – The uncompromising code formatter for Python. (pypi.org)
We gather around the water cooler at 6 foot distances as Michael and Joe aren’t sure what they streamed, we finally learn who has the best fries, at least in the US, and Allen doesn’t understand evenly distributing your condiments. For those reading this via their podcast player, this episode’s full show notes can be found at https://www.codingblocks.net/episode131. Stop by and join in on the conversation. Survey Says Anonymous VoteSign in with WordpressAre you staying sane during these stay-at-home orders?Yes. Or so the voices tell me.No. But was I ever sane?vote News We really appreciate the latest reviews, so thank you!iTunes: Braver1996summer, eleneshector, Dis14JoeStitcher: Nik P, Anonymous, Please HelP, Dis14Joe, thephdwasamistakeBe on the lookout for live streams of Joe on YouTube or Twitch! Heard Around the Water Cooler COVID-19 Pushes Up Internet Use 70% And Streaming More Than 12%, First Figures Reveal (Forbes)Security at Zoom (Zoom)Joe has been busy live streaming (YouTube)Come learn Apache Drill with us! (YouTube)Cmder – Portable console emulator for Windows.We’re still learning the keyboard shortcuts.30-Day LeetCoding Challenge (LeetCode.com)Codewars – Achieve mastery through challenge.Conway’s Game of Life (Wikipedia) by John Horton Conway (Wikipedia)Coding Interview Tips, How to get better at technical interviews without practicing (InterviewCake.com) True descriptions of languages (Reddit) Allen upgrades to the AMD Ryzen 9 3900xAMD Ryzen 9 3900X 12-core, 24-thread CPU (Amazon)Asus TUF A15 laptop review: AMD’s Ryzen 4000 is a groundbreaking mobile CPU (Eurogamer.net)Big data has been on our minds lately.Data lake (Wikipedia)Apache HadoopApache CassandraApache ParquetGoogle Cloud BigtableUber’s Big Data Platform: 100+ Petabytes with Minute Latency (eng.uber.com) Tip of the Week Interested in COBOL, game development, and Dvorak keyboards? Check out Joe’s new favorite streamer Zorchenhimer. (Twitch)Using helm uninstall doesn’t remove persistent volumes nor their claims.After doing helm uninstall RELEASE_NAME, delete the persistent volume claim using kubectl delete pvc PVC_NAME to remove the claim, which depending on the storage class and reclaim policy, will also remove the persistent volume. Otherwise, you’d need to manually remove the persistent volume using kubectl delete pv PV-NAME.kafkacat – A generic non-JVM producer and consumer for Apache Kafka. (GitHub)
We dig into the details of how databases use B-trees as we continue our discussion of Designing Data-Intensive Applications while Michael’s description of median is awful, live streaming isn’t for Allen, and Joe really wants to bring us back from the break. For those reading this via their podcast player, this episode’s full show notes can be found at https://www.codingblocks.net/episode130 in all their glory. Check it out, as Joe would say, and join the conversation. Sponsors Datadog.com/codingblocks – Sign up today for a free 14 day trial and get a free Datadog t-shirt after install the agent. Survey Says Anonymous VoteSign in with WordpressWhat tools are you using to ease WFH?RDPVPNCitrixZoomSkypeSlackWebexHangoutsTeamsDiscordPidginJitsiMessagesFaceTimejoin.meGoToMyPCvote News We really appreciate the latest reviews, so thank you!iTunes: Anips79, Jacobboyd23, LoadedGalaxy, JenT Avid ListenerPluralsight is free for the month of April! (Pluralsight)TechSmith is offering Snagit and Video Review for free through June 2020. (TechSmith)Remember when we gushed over Zoom?Zoom: Every security issue uncovered in the video chat app (CNET)Zoom Go Boom (TWiT)Maybe should use Jitsi instead of Zoom. (Jitsi)Be on the lookout for live streams of Joe on YouTube or Twitch! B-Trees are Awesome B-trees are the most commonly used indexing structure.Introduced in 1970, and called ubiquitous 10 years later.They are the implementation used by most relational database systems, as well as a number of non-relational DB’s.“Indexing” is the way databases store metadata about your data to make quick look ups.Like the SSTable, the B-tree stores key/value pairs sorted by key. This makes range query look ups quick.B-trees use fixed block sizes, referred to as pages, that are usually 4 KB in size which (generally) map well to the underlying hardware because disks are typically arranged in fixed block sizes.Every page has an address that can be referenced from other pages. These are pointers to positions on a disk.Knowing (or being able to quickly find) which page the data you are looking for is in, drastically cuts down on the amount of data you have to scan through.B-trees start with a root page. All key searches start here.This root will contain references to child pages based off of key ranges.The child pages might contain more references to other child pages based off of more narrowly focused key ranges.This continues until you reach the page that has the data for the key you searched for.These pages are called leaf pages, where the values live along with the key.The branching factor is the number of references to child pages in one page of a B-tree.The branching factor is tied to the space needed to store the page references and the range boundaries.The book states that it’s common to have a branching factor of several hundred, some even say low thousands!The higher the branching factor means the fewer levels you have to go through, i.e. less pages you have to scan, when looking for your data.Updating a value in a B-tree can be complicated.You search for the leaf node containing the key and then update the value and write it to disk.Assuming everything fits in the page, then none of the upstream references change and everything is still valid.If you are inserting a new key, you find the leaf node where the key should live based on the ranges and then you add the key and value there.Again, if everything fits in the page, then similar to the update, none of the upstream references need to change.However, if the key/value would exceed the size of the page, the page is split into two half-pages, and the parent page’s references are updated to point to the new pages.This update to the parent page might require it to also be split.And this update/split pattern might continue up to and including the root page.By splitting the pages into halves as data is added that exceeds the page size, this keeps the tree balanced.A balanced tree is the secret to consistent lookup times.It terms of big-O, a B-tree with n keys has a depth of O(log n).Most DB’s only go 3 to 4 levels deep.A tree with four levels, using a 4 KB page size, and a branching factor of 500 can store up to 256 TB! Making B-Trees Reliable The main notion is that writes in a B-tree occur in the same location as the original page, that way no references have to change, assuming the page size isn’t exceeded.Think of this as a hardware operation.These actually map to spinning drives better than SSD’s. SSD’s must rewrite large blocks of a storage chip at a time.Because some operations require multiple pages to be written, in the case of splitting full pages and updating the parent, it can be dangerous because if there is a DB crash at any point during the writing of the pages, you can end up with orphaned pages.To combat this, implementations usually include a write-ahead log (WAL, aka a redo log).This is an append-only file where all modifications go before the tree is updated.If the database crashes, this file is read first and used to put the DB back in a good, consistent state.Another issue is that of concurrency.Multiple threads reading and writing to the B-tree at the same time could read things that would be in an inconsistent state.In order to counter this problem, latches, or lightweight locks, are typically used. B-Tree Optimizations Some databases use a copy-on-write scheme. This alleviates the need to write to an append only log like previously mentioned and instead you write each updated page to a new location including updated parents that point to it.In some cases, abbreviated keys can be stored which saves space and would allow for more branching but fewer node levels, which is fewer hops to get to the leaf nodes.This is technically a B+ tree.Some implementations attempt to keep leaf pages next to each other in sequential order which would improve the seek speed to the data.Some implementations keep additional pointers, such as references to the previous and next sibling pages so it’s quicker to scan without having to go back to the parent to find the pointer to those same nodes.Variants like fractal trees, use tactics from log-structured ideas to reduce disk seeks. Comparing B-Trees and LSM-Trees B-trees are much more common and mature. We’ve ironed out the kinks and we understand the ways people use RDBMSes.LSM-trees are typically faster for writes.B-trees are typically faster for reads because LSM-trees have to check multiple data-structures, including SSTables that might be at different levels of compaction.Use cases vary, so benchmarking your use cases are important. LSM-Tree Advantages The write amplification problem:B-trees must write all data at least twice, once to the WAL and another to the page (and again if pages are split). Some storage engines go even further for redundancy.LSM-trees also rewrite data, due to compaction and tree/SSTable merging.This is particularly a problem for SSDs, which don’t do so well with repeated writes to the same segment.LSM-trees typically have better sustained write throughput because they have lower write amplification and because of they generally sequentially write the SSTable files, which is particularly important on HDDs. LSM-trees can be compressed better, and involve less space on disk.LSM-trees also have lower fragmentation on writes. LSM-Tree Downsides Compaction of the SSTables can affect performance, even though the compaction can happen in another thread, because takes up disk I/O resources, i.e. the disk has a finite amount of I/O bandwidth.It’s possible that the compaction can not be keep up with incoming events, causing you to run out of disk space, which also slows down reads as more SSTable files need to be read.This problem is magnified in a LSM-tree because a key can exist multiple times (before compaction) unlike B-trees which have just one location for a given key.The B-tree method for updating also makes it easier for B-trees to guarantee transactional isolation. Resources We Like Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann (Amazon)Data Structures – (some) Trees (episode 97)B-Tree Visualization (USFCA)SQL Server Transaction Log Architecture and Management Guide (docs.microsoft.com)Log-structured merge-tree (Wikipedia)Postgres Indexes Under the Hood (rcoh.me)Is Docker the new Git? (Coding Blocks) Tip of the Week Chocolatey adds a PowerShell command Update-SessionEnvironment or refreshenv for short, that you can use to update the environment variables in your current PowerShell session, much like . $HOME/.profile for MacOS/Linux. (Chocolatey)Use docker stats to monitor the usage of your running Docker containers. It’s like top for Docker. (Docker)Click the Equivalent REST or command line link at the bottom of the Google Cloud Console to get the equivalent as a command you can script and iterate on.Jupyter has a Docker image for you: Jupyter Docker Stats. (jupter-docker-stacks.readthedocs.io)Apache Drill is an amazing schema-free SQL query engine for Hadoop, NoSQL, and Cloud Storage. (drill.apache.org)Get up and running in minutes with Drill + Docker (drill.apache.org)Presto, aka Presto DB, not to be confused with Presto SQL, is distributed SQL query engine for big data originally developed by Facebook. (prestodb.io)
View 193 more appearances
Share Profile
Get appearance alerts
Subscribe to receive notifications by email whenever this creator appears as a guest on an episode.

Subscribe to receive notifications by email whenever this creator appears as a guest on an episode.

Are you Michael? Verify and edit this page to your liking.

Followers

Recommend This Creator

Recommendation sent

Join Podchaser to...

  • Rate podcasts and episodes
  • Follow podcasts and creators
  • Create podcast and episode lists
  • & much more

Creator Details

Location
Atlanta, GA, USA
Episode Count
197
Podcast Count
1
Total Airtime
1 week, 5 days