Rational, predictive metabolic engineering of organisms requires an ability to associate biological activity to the corresponding gene(s). Despite extensive advances in the 20 years since the Escherichia coli genome was published, there are still gaps in our knowledge of protein function. The substantial amount of data that has been published, such as: omics-level characterization in a myriad of conditions; genome-scale libraries; and evolution and genome sequencing, provide means of identifying and prioritizing proteins for characterization. This review describes the scale of this knowledge gap, demonstrates the benefit of addressing the knowledge gap, and demonstrates the availability of interesting candidates for characterization.