Putting Mental Models to Practice Part 5: Skill Extraction

This is Part 5 of a series of posts on putting mental models to practice. In Part 1 I described my problem with Munger's latticework of mental models in service of decision-making after I applied it to my life. In Part 2 I argued that the study of rationality is a good place to start for a framework of practice. We learnt that Farnam Street's list of mental models consists of three types of models: descriptive models, thinking models concerned with judgment (epistemic rationality), and thinking models concerned with decision making (instrumental rationality).


This is a companion discussion topic for the original entry at https://commoncog.com/putting-mental-models-to-practice-part-5-skill-extraction/

I’m revisiting this article with the context of GenAI being what it is in 2025. Some important questions I’m rumintating on I’d love to discuss with the brilliant minds here:

  • Can CDM/skill extraction/other TBD methods be viable foundation for capturing, organizing, and storing all the knowledge held in an organization?
  • How can we use LLMs to extract knowledge from individuals of an organization at scale?
  • Do any current enterprise data models support knowledge extraction and management, or do we need to invent something new?
3 Likes

I am not aware of any work trying to do that, but I do think these things are interesting to try. I was thinking about this in the context of the customer demand series too. I dont think you necessarily need different kinds of models but you will need some prompting or fine tuning.

Randomly related to the practise theme, I just started watching Nathan Fielders tv series The Rehearsal where he gets people to practise difficult life decisions with actors, building up complex flowcharts, to see if it makes decisions easier or more effective for them.

3 Likes

Yeah I’ve been thinking about this for a bit also, but if it happens, I’m not sure if it’s going to be for the same reasons that people commonly cite.

There was an attempt to explicate tacit knowledge and put it into expert systems in the 90s that crashed and burn. I think the field was called knowledge management, and it is thought to have mostly failed. You may read more details from @joepairman here.

I haven’t studied that attempt very deeply, so I’m not sure why it failed. But I suspect tacit knowledge extraction in the current era will be more aligned with the Cognitive System Engineering approach. That is: study what experts do, and then design joint cognitive systems to highlight cues and expectancies that experts pay attention to, so that humans users benefit from the expertise extraction. I’m not entirely sure how AI may help with this approach, but there should be something here.

A good example of this might be Wendy Jephson’s work with Nasdaq. NDM podcast episode here; paper here.

Edited to add: I found a copy from the author’s website, and uploaded it below; alas, much of the details of the Nasdaq system are left out of the paper, for obvious reasons.

contribution836.pdf (574.2 KB)

4 Likes

Thanks for mentioning that thread, @cedric . Knowledge Management isn’t dead — in fact as pure Gen AI initiatives struggle, there’s a bit more interest. (As a hint why — the kind of improvisational, ā€œgood-enoughā€, flavor of Generative AI is often far more reliable and useful if it’s applied around a framework of explicit knowledge models.)

A Gartner consultant mentioned to me a few months ago that a client had come to him wondering how to make (Gen) AI actually pay off for them. The client said ā€œI hear I need to do knowlege management — what’s that?ā€ (In case of any doubt, a Gartner consultant is not the best source to learn about knowledge management. They are excellent, however, for learning about the business models called ā€œpay to playā€ and ā€œdouble dipā€.)

But for sure, given that tacit knowledge gets a mention in every KM book, even the most technology-oriented ones, then that aspect of KM has failed. Tacit knowledge is gained and passed on in the same ad-hoc way it always has been. Probably passed on less, now there are fewer cues from a physical work environment. (There is a lot of truth behind the concept of ā€œembodied cognitionā€, even if a lot of academic BS surrounds it.)

As for why codifying tacit knowledge hasn’t taken off? Probably because it makes people think too hard. And the results aren’t immediately visible.

Thank you for the link to the Jephson paper! Very useful and pithy.

I wonder what would happen if I tried to record what I know about good and bad ways to manage information tooling and projects. There are so many recurring patterns, but they’re not always obvious. I recognize them mostly through a gut feel, though I can articulate some of them OK.

5 Likes

Thanks for all the comments and resources. I’ve gone down quite the rabbit hole this morning digging through this. The way I’m thinking about my current goals in learning more about this:

  • I strongly concur with the now well trod opinion that AI is unlikely to supplant most jobs, but rather will carve off pieces of jobs until the ā€œuniquely humanā€ components remain. For example in a customer service setting, AI will take on the role of transcriber, summarizer, fact recaller, etc.
  • AI can recognize patterns and provide decision support. Continuing the customer service example, if a user calls in to diagnose a syncing problem with their file sharing system, AI can aggregate and present ā€œcasesā€ recorded in past support issues to prime the call center associate to recognize an issue that’s been solved before and offer avenues to attempt a resolution
  • Over time, many of these inbound calls would become relatively rote and could eventually be fully automated; for the exceptions, that’s where CTA/NDM starts to come into play. In our call center example, deep troubleshooting becomes the core skill in play, and as discussed in this thread (https://forum.commoncog.com/t/troubleshooting-as-a-perennial-skill/2574) it’s where humans beat AI
  • Our problem has now shifted from one of ā€œcapture all the information in a KMSā€ to ā€œidentify the uniquely human skills/scenarios in a job, and codify how to accelerate expertise in these skillsā€

I still feel like there’s something there with GenAI/LLMs in particular - could a human/AI pair to complete CTA at scale more effectively?

3 Likes

Well, there’s an ongoing research project by the Schmidt Futures foundation to use AI paired with humans to speed up CTA. I think it’s a separate program, running in parallel, to this one: https://naturalisticdecisionmaking.org/cta-in-eaffect/

I’m not entirely sure how it’s going, but I can reach out to the NDM folks to ask.

3 Likes

@Jonathan, I think the ā€œcapture all the information in a KMSā€ is the 90s/2000s dream that Cedric referred to as crashing and burning.

And about CTA and tacit knowledge, I agree with you about the ongoing role of humans.

But I wanted to be a bit more specific about knowledge models, since I realized that I hadn’t answered these points directly:

I think these questions go beyond skill extraction. Knowledge includes knowledge about products and organizational structures, and all sorts of other networks of facts. And there are many related data models. So many that you could choose several different ways just to classify them:

  • Abstract or formal structure: Property-based/relational? Hierarchical? (With polyhierarchy?) If hierarchical, does the hierarchy signify membership, or constituent parts, or some kind of rank? If not hierarchical, what other groupings exist? (We can say ā€œgraphā€, but that’s really a lowest common denominator. You can represent pretty much every other data structure with a graph.)
  • Purpose: is this for finding stuff, recommending stuff, prioritizing work, sense-making, or …? Are we describing content, or is the data actually the content itself?
  • (If the specific goal is capturing skill-based knowledge, I still like the idea of Concept Mapping as outlined by Gary Klein in Working Minds, which is basically a simple ontology that could be serialized either as triples or via property graph structure. Though I have not tried concept mapping for its intended purpose of CTA.)
  • Longevity: is this short-lived or long?
  • Technical syntax / serialization: one of many forms of JSON/XML, or relational data, or some ā€œpure graphā€ structure?

One good viewpoint for the more human side of all this, ranging from librarian-style taxonomies to Cynefin, is Patrick Lambe’s great ā€œOrganizing Knowledgeā€, quite expensive to buy outright but apparently readable with a Ā£12/month subscription here: [PDF] Organising Knowledge by Patrick Lambe | 9781843342281, 9781780632001

And regarding the AI use cases you mentioned of decision-support and fact-recalling, just to note (as I’m sure you also have in mind) that AI is much more than just generative AI. Explicit knowledge models are more popular again as part of AI applications like these (having taken a knock for a couple of years when people overestimated the symbolic capabilities of generative AI). Sometimes people call it Semantic AI, though I think that’s a bit limiting. And there is still a big space for more ā€œtraditionalā€, task-trained machine learning, and I believe for symbolic approaches to AI that don’t necessarily involve explicit knowledge models …

Basically I just wanted to point out that the general field is quite huge! That is, knowledge management / modeling / applications, and related information theory. But on the specific topic of skill extraction, your thoughts make sense to me and I’m interested to read more about this.

2 Likes

I agree. Just about any model that’s explicitly defined is going to perform better than ā€œpulling in a bunch of stuffā€, because you’ve imbued value into the data by organizing it with intent.

I’m not a data scientist (clearly), but I’d be interested to see what would come out of throwing a LLM at a dataset, giving it the goal of ā€œprovide a formal structure for the patterns that emergeā€ and then having a human take a final pass at the data model. You could then run it through the model again to ā€œfill outā€ the data model with all the known cases. I’m working on something like this right now in my spare time to teach my son how to get better at baseball :sweat_smile:

The link @cedric provided above featured an interesting case study where researchers performed CTA on auto accident assessors and were able to show drastic improvement in the ability to recognize and classify the types of damage.

Thanks for providing these examples/questions. An interesting thought here is, do we (humans) have to care about this anymore? Or can AI/ML systems be developed where we leave the performance tuning and knowledge structuring to the machines? The reason I specifically called out GenAI is I’m viewing it as a nearly universal ā€œtranslatorā€ between humans and machines. I know this has limits in practice, but it won’t be long before RAG models do this nearly perfectly.

1 Like

Thank you @Jonathan for the question, and @cedric for connecting the dots back to @joepairman’s writeup!

And:


I want to think out loud through this problem from first principles and ideate on how LLMs might be used for skills extraction at scale. Please challenge me if I’m missing something or making incorrect assumptions!

Thesis

Rather than asking people to explicitly document their expertise (which fails due to extra work and different skill requirements), an LLM can be deployed to fields where the work itself generates documentation and where we can measure outcomes objectively. This creates a natural dataset for LLM-based expertise extraction. Let me elaborate:

Criteria A: Auto-Documenting Work

What fields have auto-documentation, where documentation is not an afterthought but an integral part of the work itself (if not the entirety of the work product)? This kind of aligns with the thinking in ā€œnote-taking makes you a better expertā€ - when information creation is core to the job, it naturally captures the application of skill.

I’m going down this avenue because my experience with knowledge management (limited experience but mostly negative) has consistently looked like this:

  1. Each KM system requires supplementary work beyond core responsibilities
  2. The demands of creating that documentation is different expertise than the actual job
  3. As a result, it either gets disregarded/ignored, or if there is a mandate around it, it gets managed by separate processes or people (sometimes becoming compliance theater rather than expertise capture). If not compliance theater, then those KM systems change to be focused on uses for training and onboarding, and thus are again not directly representative of any organizational expertise.

Criterion B: Measurable Outcomes

As a corrollary to auto-documenting skills application, there needs to be a clear connection between work artifacts and results. This gives us objective measures of expertise rather than subjective assessments.

Yes, drawing causation can be challenging in complex systems, but my guess is that sufficient data over time can identify meaningful correlations in a domain. If those correlations can’t be established (or on meta-analysis reflect a system that’s being gamed), then exploration should shift to a new avenue.

LLMs & Skill Extraction: Solving on Black Box by Applying Another

I want to digress into cybernetics for a second, particularly Stafford Beer’s POSIWID (The Purpose of a System is What it Does) and Black Boxes:

According to the cybernetician, the purpose of a system is what it does. This stands for bald fact, which makes a better starting point in seeking understanding than familiar attributions of good intention, prejudices about expectations, moral judgment, or sheer ignorance of circumstances.

Beer suggests not opening a black boxes, and evaluating them only by what they do. I’m interested in this framing for skills extraction at scale with LLMs because it aligns neatly with using auto-documented work as representative of skill. If the work is your application of skill and expertise, you don’t need to further prod into the human rationalization behind it. The work is the POSIWID’ing black box.

On one hand, it’s not impossible to create a surveying mechanism that mimics CTA interviews and feeds that into an LLM to do skill extraction at scale. But on the other hand, if you already have a body of auto-generated documents for LLM ingestion that is representative of the application of skill, and data sets that evaluate the outcomes of those applications, then why not just do that and use the LLM for pattern matching analysis? A cybernetic ā€˜black box’ approach means that you can take an artifact of work and disregard intention. I.e., if you’re writing code, then the code you’ve written is your artifact of expertise and your intentions do not matter and introduce messiness; to invert the cybernetics term, at scale, what you do is your expertise. This:

  • Eliminates intention as a complicating factor
  • Allows you to account for exceptional variation over time
  • Use observable artifacts to do pattern matching at scale
  • Makes LLM analysis more tractable

So what can this be applied to?

My professional bias suggests three areas. (I was on the fence about Product Management as a fourth, but I think it’s very hard to draw the line from product documents to implementation and impact. Too many conflating variables and weak connections.)

1. Software Engineering

Artifacts: Code and commits
Measurable Outcomes: Using DORA metrics and observability data, you can connect commits to:

  • Production defects and downtime
  • Code longevity (as a proxy for quality)
  • System performance metrics

The limitation is that it could end up measuring ā€œskill at hitting metricsā€ rather than true engineering expertise.

2. Content Marketing

Artifacts: Writen content (articles, interviews, etc.)
Measurable Outcomes: Direct ties to:

  • Views and engagement
  • Lead generation
  • Conversion rates
  • Time on page

If all you have to work with is written content, and you view business success with content by a series of engagement metrics, then it feels like a good proxy for expertise. Sub in the metrics that you care about.

3. Clinical Documentation (Most Promising)

Artifacts: Patient charts and clinical documentation
Measurable Outcomes: Patient outcomes tracked through:

  • Care journey progression
  • Readmission rates
  • Treatment effectiveness
  • Recovery metrics

Every patient interaction creates a chart documenting:

  • Initial assessment and chosen tests (reflecting clinical judgment)
  • Interventions selected
  • Decision-making process

All actions are recorded for compliance, creating a rich dataset that naturally captures the application of clinical expertise application, even when controlling for documentation styles.

Application & Processing

  1. Statistical Analysis: Cue a speed-running through the data-driven blog series, I’d lead with Deming’s approach to process analysis, identify high-performing vs. low-performing outputs…
  2. LLM Pattern Recognition: … except rather than purely statistical analysis, this is where LLMs could be applied to analyze free-text differences in the work artifacts between performance levels and to identify intrinsic and extrinsic patterns.
  3. Do this at scale.

As an example, a colleague I’ve been talking with recently in healthcare is currently testing this approach, albeit semi-manually:

  • Using clinical documentation and patient outcomes to identify provider performance
  • Using LLMs to analyzing patterns in high-performing documentation
  • Then using that to creating targeted training for individual providers and department groups with low poor-performance, while using the high-performing artifacts as a benchmark.

Obvious problems with this approach:

  1. Metric Gaming: Risk of optimizing for metrics rather than underlying outcomes
  2. Context Loss: Tacit knowledge may not always appear in documentation, and even auto-documenting work can have some lossiness (i.e., a staff engineer whose expertise is in pair programming and has little individual output).
  3. Causation vs. Correlation: Difficulty establishing true causal relationships

Is this a useful way to think about knowledge extraction at scale with LLMs, or am I falling into well-trod paths that have been tried, tested, and discarded?

1 Like

I agree this is challenging, but would also say that there’s ways LLMs could help here too. Consider my previous example of the call center - an AI ā€œcall summarizerā€ is doing the work of producing documentation based on the conversational inputs of the call center rep and the customer. This could be extended to aggregate and encode any interpersonal communication.

Today we want that call summarizer to produce something human readable. Is there some future where instead of summarizing it, it encodes it in a way that it becomes a ā€œleafā€ on a ā€œtree of knowledgeā€? This goes back to @joepairman’s point about explicit knowledge structures as well.

Couldn’t agree more. That documentation’s value isn’t inherent, it’s derived from creating value somewhere else (e.g., your ā€œknowledge treeā€ helps you reduce customer service times from an average of 8 minutes down to 4 minutes, or reduces calls altogether by creating a bot that can answer the questions).

Finally, this takes me to a point that came up in conversation with a colleague today. There’s different value in both ā€œcommon knowledgeā€ and ā€œuncommon knowledge,ā€ and your method for extracting it would differ. You could imagine gathering ā€œcommon knowledgeā€ through simple interview bots that ask the same questions to many people (e.g., ask every software engineer in a particular function ā€œHow do you set up a CI/CD pipeline for [area]?ā€ or ā€œHow do you deploy an application to kubernetes?ā€). The shape of this common knowledge would likely converge strongly if it’s widely known and well understood. Just to continue butchering my tree metaphor, this would be a highly shaped tree in a well manicured garden. When tougher questions get asked and the answers are more diversified (or just messier) - that’s when you bring in humans to apply something like CTA to ā€œcreateā€ an artifical tree instead of letting the answers shape it organically.

Distilling it down into a single question - is it naive and overly simplistic to think you can build a knowledge corpus with a combination of crowd sourcing and more directed CTA?

1 Like