Who’s kidding whom?

Originally published on LinkedIn 13Feb2025

Almost every day someone posts an article about the newest “Silly LLM” meme. Whether it is giving someone a new spouse as reported in Fortune or an amazingly toxic recipe (for disaster) the feeds are alive with these reports. Of course, a lot depends on a) what you are using the LLM for and b) how trustworthy your other sources are perceived to be.

That is probably why we see so many conflicting articles, one day it is :Generative AI out performs doctors in medical diagnosis” the next day it is “Study reveals AI’s critical flaw in medical decision making.

Underneath there is a strong drumbeat of “If you use the LLM for what it is good at, you can really boost your business performance.” They tell you to stick with summarization, email generation, help-desk chatbots, and you’ll be fine. But will you be? Maybe ask Air Canada about the recent court decision forcing them to honor a ‘hallucinated’ discount for a passenger.

At GunderFish, we think that generative AIs can be a valuable tool, just not the way they are currently operating. The so called ‘hallucinations’ crop up everywhere, and require a keen and knowledgeable eye to detect. So much so, that it may be more work to correct the errors that it would be to do the work yourself. That is why we focus on integrating Semantic AI into the tool chain, an overwatch system that can detect the more egregious errors before they get into print.

We focus on providing our clients with solutions that integrate AI (not just LLMs) into their business processes. One service helps clients navigate through the complex and rapidly shifting impacts of regulatory changes on their operations. It helps them winnow through the chaff to see what may be important – not only to their business directly, but also to the businesses they depend on. So, we need to manage a complex web of interconnected businesses, and to do so we rely on the North American Industry Classification System (NAICS).

This hierarchical breakdown of business types into around a thousand individual codes has been in use for decades. It is well known and well documented. It should be a peice of cake for the LLMs to navigate, right?

Well, we did some tests.

We began with several of our in-house open source LLMs – these can be custom tuned for our clients and we have (some) control of the underlying data sets, especially using well curated data for RAG-based queries. Given that the base datasets rely on the scraped web over many years, and the common use of NAICS codes for business classification, there should be plenty of accurate source data in the LLM training set.

We asked a simple question – “What other types of businesses does this business depend on to operate?” We requested the output in table with three columns: The NAICS code of the target business, the code of each related business, and the short descriptive title of the business type.

Without naming names, let’s just say we were disappointed. One response indicated that a small sit-down Mexican restaurant could not survive without a good relationship with partner in the auto glass repair industry. Before you jump to any conclusions this also held with Italian restaurants.

A second accurately identified key support businesses, but then simply made up random codes for the NAICS classifications.

One system, touted as a cutting edge “reasoning model” attempted to reverse-engineer the coding scheme used in the NAICS rather than just looking up the answer, with predictable (admittedly humorous) asides as it explained its deductions.

Okay, but these were the open source models, surely the commercial models would fair better, right?

One of the most popular models began by misclassifying the Mexican restaurant’s NAICS code. Rather than the code for a full service restaurant it insisted that any Mexican restaurant must be a commercial caterer. No matter how many times we told it what the restaurant did (including pasting the NAICS description, word for word into the prompts) it insisted that our target business did not have a restaurant. While it did capture some of the associated businesses correctly, it include others that were specific for catering.

Another did a relatively good job assessing the associated businesses, but then went off the rails in the details. Yes, a restaurant depends on having fresh food delivered – but where did the code for soybean farms come from? I mean how many soybeans go into a Chimichanga?

Also, apparently restaurants routinely make use of child labor paid for by the parents, since a key business code that kept coming up was the one for Child Day Care Services – labeled as temporary labor.

All the silliness aside, yes it is possible to leverage LLMs to improve a business, but it really needs a supporting system to make sure that the results pass the laugh test. That’s why, at GunderFish, we integrate a semantic component into our products and services. You really don’t want your business to be the source of the next “Silly LLM” meme.

Leave a Reply

Your email address will not be published. Required fields are marked *