This is Part 5 of a series of posts on putting mental models to practice. In Part 1 I described my problem with Munger's latticework of mental models in service of decision-making after I applied it to my life. In Part 2 I argued that the study of rationality is a good place to start for a framework of practice. We learnt that Farnam Street's list of mental models consists of three types of models: descriptive models, thinking models concerned with judgment (epistemic rationality), and thinking models concerned with decision making (instrumental rationality).
Iām revisiting this article with the context of GenAI being what it is in 2025. Some important questions Iām rumintating on Iād love to discuss with the brilliant minds here:
Can CDM/skill extraction/other TBD methods be viable foundation for capturing, organizing, and storing all the knowledge held in an organization?
How can we use LLMs to extract knowledge from individuals of an organization at scale?
Do any current enterprise data models support knowledge extraction and management, or do we need to invent something new?
I am not aware of any work trying to do that, but I do think these things are interesting to try. I was thinking about this in the context of the customer demand series too. I dont think you necessarily need different kinds of models but you will need some prompting or fine tuning.
Randomly related to the practise theme, I just started watching Nathan Fielders tv series The Rehearsal where he gets people to practise difficult life decisions with actors, building up complex flowcharts, to see if it makes decisions easier or more effective for them.
Yeah Iāve been thinking about this for a bit also, but if it happens, Iām not sure if itās going to be for the same reasons that people commonly cite.
There was an attempt to explicate tacit knowledge and put it into expert systems in the 90s that crashed and burn. I think the field was called knowledge management, and it is thought to have mostly failed. You may read more details from @joepairmanhere.
I havenāt studied that attempt very deeply, so Iām not sure why it failed. But I suspect tacit knowledge extraction in the current era will be more aligned with the Cognitive System Engineering approach. That is: study what experts do, and then design joint cognitive systems to highlight cues and expectancies that experts pay attention to, so that humans users benefit from the expertise extraction. Iām not entirely sure how AI may help with this approach, but there should be something here.
A good example of this might be Wendy Jephsonās work with Nasdaq. NDM podcast episode here; paper here.
Edited to add: I found a copy from the authorās website, and uploaded it below; alas, much of the details of the Nasdaq system are left out of the paper, for obvious reasons.
Thanks for mentioning that thread, @cedric . Knowledge Management isnāt dead ā in fact as pure Gen AI initiatives struggle, thereās a bit more interest. (As a hint why ā the kind of improvisational, āgood-enoughā, flavor of Generative AI is often far more reliable and useful if itās applied around a framework of explicit knowledge models.)
A Gartner consultant mentioned to me a few months ago that a client had come to him wondering how to make (Gen) AI actually pay off for them. The client said āI hear I need to do knowlege management ā whatās that?ā (In case of any doubt, a Gartner consultant is not the best source to learn about knowledge management. They are excellent, however, for learning about the business models called āpay to playā and ādouble dipā.)
But for sure, given that tacit knowledge gets a mention in every KM book, even the most technology-oriented ones, then that aspect of KM has failed. Tacit knowledge is gained and passed on in the same ad-hoc way it always has been. Probably passed on less, now there are fewer cues from a physical work environment. (There is a lot of truth behind the concept of āembodied cognitionā, even if a lot of academic BS surrounds it.)
As for why codifying tacit knowledge hasnāt taken off? Probably because it makes people think too hard. And the results arenāt immediately visible.
Thank you for the link to the Jephson paper! Very useful and pithy.
I wonder what would happen if I tried to record what I know about good and bad ways to manage information tooling and projects. There are so many recurring patterns, but theyāre not always obvious. I recognize them mostly through a gut feel, though I can articulate some of them OK.
Thanks for all the comments and resources. Iāve gone down quite the rabbit hole this morning digging through this. The way Iām thinking about my current goals in learning more about this:
I strongly concur with the now well trod opinion that AI is unlikely to supplant most jobs, but rather will carve off pieces of jobs until the āuniquely humanā components remain. For example in a customer service setting, AI will take on the role of transcriber, summarizer, fact recaller, etc.
AI can recognize patterns and provide decision support. Continuing the customer service example, if a user calls in to diagnose a syncing problem with their file sharing system, AI can aggregate and present ācasesā recorded in past support issues to prime the call center associate to recognize an issue thatās been solved before and offer avenues to attempt a resolution
Over time, many of these inbound calls would become relatively rote and could eventually be fully automated; for the exceptions, thatās where CTA/NDM starts to come into play. In our call center example, deep troubleshooting becomes the core skill in play, and as discussed in this thread (https://forum.commoncog.com/t/troubleshooting-as-a-perennial-skill/2574) itās where humans beat AI
Our problem has now shifted from one of ācapture all the information in a KMSā to āidentify the uniquely human skills/scenarios in a job, and codify how to accelerate expertise in these skillsā
I still feel like thereās something there with GenAI/LLMs in particular - could a human/AI pair to complete CTA at scale more effectively?
Well, thereās an ongoing research project by the Schmidt Futures foundation to use AI paired with humans to speed up CTA. I think itās a separate program, running in parallel, to this one: https://naturalisticdecisionmaking.org/cta-in-eaffect/
Iām not entirely sure how itās going, but I can reach out to the NDM folks to ask.
@Jonathan, I think the ācapture all the information in a KMSā is the 90s/2000s dream that Cedric referred to as crashing and burning.
And about CTA and tacit knowledge, I agree with you about the ongoing role of humans.
But I wanted to be a bit more specific about knowledge models, since I realized that I hadnāt answered these points directly:
I think these questions go beyond skill extraction. Knowledge includes knowledge about products and organizational structures, and all sorts of other networks of facts. And there are many related data models. So many that you could choose several different ways just to classify them:
Abstract or formal structure: Property-based/relational? Hierarchical? (With polyhierarchy?) If hierarchical, does the hierarchy signify membership, or constituent parts, or some kind of rank? If not hierarchical, what other groupings exist? (We can say āgraphā, but thatās really a lowest common denominator. You can represent pretty much every other data structure with a graph.)
Purpose: is this for finding stuff, recommending stuff, prioritizing work, sense-making, or � Are we describing content, or is the data actually the content itself?
(If the specific goal is capturing skill-based knowledge, I still like the idea of Concept Mapping as outlined by Gary Klein in Working Minds, which is basically a simple ontology that could be serialized either as triples or via property graph structure. Though I have not tried concept mapping for its intended purpose of CTA.)
Longevity: is this short-lived or long?
Technical syntax / serialization: one of many forms of JSON/XML, or relational data, or some āpure graphā structure?
One good viewpoint for the more human side of all this, ranging from librarian-style taxonomies to Cynefin, is Patrick Lambeās great āOrganizing Knowledgeā, quite expensive to buy outright but apparently readable with a Ā£12/month subscription here: [PDF] Organising Knowledge by Patrick Lambe | 9781843342281, 9781780632001
And regarding the AI use cases you mentioned of decision-support and fact-recalling, just to note (as Iām sure you also have in mind) that AI is much more than just generative AI. Explicit knowledge models are more popular again as part of AI applications like these (having taken a knock for a couple of years when people overestimated the symbolic capabilities of generative AI). Sometimes people call it Semantic AI, though I think thatās a bit limiting. And there is still a big space for more ātraditionalā, task-trained machine learning, and I believe for symbolic approaches to AI that donāt necessarily involve explicit knowledge models ā¦
Basically I just wanted to point out that the general field is quite huge! That is, knowledge management / modeling / applications, and related information theory. But on the specific topic of skill extraction, your thoughts make sense to me and Iām interested to read more about this.
I agree. Just about any model thatās explicitly defined is going to perform better than āpulling in a bunch of stuffā, because youāve imbued value into the data by organizing it with intent.
Iām not a data scientist (clearly), but Iād be interested to see what would come out of throwing a LLM at a dataset, giving it the goal of āprovide a formal structure for the patterns that emergeā and then having a human take a final pass at the data model. You could then run it through the model again to āfill outā the data model with all the known cases. Iām working on something like this right now in my spare time to teach my son how to get better at baseball
The link @cedric provided above featured an interesting case study where researchers performed CTA on auto accident assessors and were able to show drastic improvement in the ability to recognize and classify the types of damage.
Thanks for providing these examples/questions. An interesting thought here is, do we (humans) have to care about this anymore? Or can AI/ML systems be developed where we leave the performance tuning and knowledge structuring to the machines? The reason I specifically called out GenAI is Iām viewing it as a nearly universal ātranslatorā between humans and machines. I know this has limits in practice, but it wonāt be long before RAG models do this nearly perfectly.
I want to think out loud through this problem from first principles and ideate on how LLMs might be used for skills extraction at scale. Please challenge me if Iām missing something or making incorrect assumptions!
Thesis
Rather than asking people to explicitly document their expertise (which fails due to extra work and different skill requirements), an LLM can be deployed to fields where the work itself generates documentation and where we can measure outcomes objectively. This creates a natural dataset for LLM-based expertise extraction. Let me elaborate:
Criteria A: Auto-Documenting Work
What fields have auto-documentation, where documentation is not an afterthought but an integral part of the work itself (if not the entirety of the work product)? This kind of aligns with the thinking in ānote-taking makes you a better expertā - when information creation is core to the job, it naturally captures the application of skill.
Iām going down this avenue because my experience with knowledge management (limited experience but mostly negative) has consistently looked like this:
Each KM system requires supplementary work beyond core responsibilities
The demands of creating that documentation is different expertise than the actual job
As a result, it either gets disregarded/ignored, or if there is a mandate around it, it gets managed by separate processes or people (sometimes becoming compliance theater rather than expertise capture). If not compliance theater, then those KM systems change to be focused on uses for training and onboarding, and thus are again not directly representative of any organizational expertise.
Criterion B: Measurable Outcomes
As a corrollary to auto-documenting skills application, there needs to be a clear connection between work artifacts and results. This gives us objective measures of expertise rather than subjective assessments.
Yes, drawing causation can be challenging in complex systems, but my guess is that sufficient data over time can identify meaningful correlations in a domain. If those correlations canāt be established (or on meta-analysis reflect a system thatās being gamed), then exploration should shift to a new avenue.
LLMs & Skill Extraction: Solving on Black Box by Applying Another
I want to digress into cybernetics for a second, particularly Stafford Beerās POSIWID (The Purpose of a System is What it Does) and Black Boxes:
According to the cybernetician, the purpose of a system is what it does. This stands for bald fact, which makes a better starting point in seeking understanding than familiar attributions of good intention, prejudices about expectations, moral judgment, or sheer ignorance of circumstances.
Beer suggests not opening a black boxes, and evaluating them only by what they do. Iām interested in this framing for skills extraction at scale with LLMs because it aligns neatly with using auto-documented work as representative of skill. If the work is your application of skill and expertise, you donāt need to further prod into the human rationalization behind it. The work is the POSIWIDāing black box.
On one hand, itās not impossible to create a surveying mechanism that mimics CTA interviews and feeds that into an LLM to do skill extraction at scale. But on the other hand, if you already have a body of auto-generated documents for LLM ingestion that is representative of the application of skill, and data sets that evaluate the outcomes of those applications, then why not just do that and use the LLM for pattern matching analysis? A cybernetic āblack boxā approach means that you can take an artifact of work and disregard intention. I.e., if youāre writing code, then the code youāve written is your artifact of expertise and your intentions do not matter and introduce messiness; to invert the cybernetics term, at scale, what you do is your expertise. This:
Eliminates intention as a complicating factor
Allows you to account for exceptional variation over time
Use observable artifacts to do pattern matching at scale
Makes LLM analysis more tractable
So what can this be applied to?
My professional bias suggests three areas. (I was on the fence about Product Management as a fourth, but I think itās very hard to draw the line from product documents to implementation and impact. Too many conflating variables and weak connections.)
1. Software Engineering
Artifacts: Code and commits Measurable Outcomes: Using DORA metrics and observability data, you can connect commits to:
Production defects and downtime
Code longevity (as a proxy for quality)
System performance metrics
The limitation is that it could end up measuring āskill at hitting metricsā rather than true engineering expertise.
2. Content Marketing
Artifacts: Writen content (articles, interviews, etc.) Measurable Outcomes: Direct ties to:
Views and engagement
Lead generation
Conversion rates
Time on page
If all you have to work with is written content, and you view business success with content by a series of engagement metrics, then it feels like a good proxy for expertise. Sub in the metrics that you care about.
Every patient interaction creates a chart documenting:
Initial assessment and chosen tests (reflecting clinical judgment)
Interventions selected
Decision-making process
All actions are recorded for compliance, creating a rich dataset that naturally captures the application of clinical expertise application, even when controlling for documentation styles.
Application & Processing
Statistical Analysis: Cue a speed-running through the data-driven blog series, Iād lead with Demingās approach to process analysis, identify high-performing vs. low-performing outputsā¦
LLM Pattern Recognition: ⦠except rather than purely statistical analysis, this is where LLMs could be applied to analyze free-text differences in the work artifacts between performance levels and to identify intrinsic and extrinsic patterns.
Do this at scale.
As an example, a colleague Iāve been talking with recently in healthcare is currently testing this approach, albeit semi-manually:
Using clinical documentation and patient outcomes to identify provider performance
Using LLMs to analyzing patterns in high-performing documentation
Then using that to creating targeted training for individual providers and department groups with low poor-performance, while using the high-performing artifacts as a benchmark.
Obvious problems with this approach:
Metric Gaming: Risk of optimizing for metrics rather than underlying outcomes
Context Loss: Tacit knowledge may not always appear in documentation, and even auto-documenting work can have some lossiness (i.e., a staff engineer whose expertise is in pair programming and has little individual output).
Causation vs. Correlation: Difficulty establishing true causal relationships
Is this a useful way to think about knowledge extraction at scale with LLMs, or am I falling into well-trod paths that have been tried, tested, and discarded?
I agree this is challenging, but would also say that thereās ways LLMs could help here too. Consider my previous example of the call center - an AI ācall summarizerā is doing the work of producing documentation based on the conversational inputs of the call center rep and the customer. This could be extended to aggregate and encode any interpersonal communication.
Today we want that call summarizer to produce something human readable. Is there some future where instead of summarizing it, it encodes it in a way that it becomes a āleafā on a ātree of knowledgeā? This goes back to @joepairmanās point about explicit knowledge structures as well.
Couldnāt agree more. That documentationās value isnāt inherent, itās derived from creating value somewhere else (e.g., your āknowledge treeā helps you reduce customer service times from an average of 8 minutes down to 4 minutes, or reduces calls altogether by creating a bot that can answer the questions).
Finally, this takes me to a point that came up in conversation with a colleague today. Thereās different value in both ācommon knowledgeā and āuncommon knowledge,ā and your method for extracting it would differ. You could imagine gathering ācommon knowledgeā through simple interview bots that ask the same questions to many people (e.g., ask every software engineer in a particular function āHow do you set up a CI/CD pipeline for [area]?ā or āHow do you deploy an application to kubernetes?ā). The shape of this common knowledge would likely converge strongly if itās widely known and well understood. Just to continue butchering my tree metaphor, this would be a highly shaped tree in a well manicured garden. When tougher questions get asked and the answers are more diversified (or just messier) - thatās when you bring in humans to apply something like CTA to ācreateā an artifical tree instead of letting the answers shape it organically.
Distilling it down into a single question - is it naive and overly simplistic to think you can build a knowledge corpus with a combination of crowd sourcing and more directed CTA?