Having successfully AutoClassified more than 44 billion files, we've learned a few clever things along the way. Here are our Top 10 AutoClassification Pro Tips:
1. Utilize Lists
Capitalize on previous work. Lists, tables, policies and other reference files all go a long way towards assisting an AutoClassification effort. Common helpful lists include: Employee Names, Product Names, Office Locations, Active Litigation Matters, and Suppliers. No need to supply generic lists, such as Cities and Zip Codes, those should be built-in to any strong AutoClassification platform.
2. Stop Searching
Try sorting on Tags and Attributes. If you are AutoClassifying, you've gone through some effort to properly tag and identify critical attributes in your content. Use those rich metadata tags to work for you. Instead of crunching complex searches, try drilling down by attributes instead. Don't use blind search to find that which you've already tagged!
3. Avoid Short Abbreviations
Be wary of unintended consequences of short abbreviations. Be careful with abbreviations of 2 characters or less, such as US state abbreviations. Their presence as words or contained in other words is too frequent in general English usage to be reliable for AutoClassification. Consider the phrase, "Were you in Cincinnati; oh, I didn't know." [Note: Cincinnati, OH appearing in sequence]. In this case the "oh" is meant as an interjection, not as the state of Ohio. This issue is particularly pronounced when your document population contains scanned paper files, as the OCR recognition on 2-character strings can be difficult.
4. Format Dates Properly
Use the ISO Date Standard for Field Values. There are so many variations in how we write dates! Consider the following: 6/15/20; 06/15/20; 6/15/2020; 06 June 2020; June 6th, 2020; and so on. The presence of European style dates, such as 15/06/20, complicates things further. The ISO date format of 2020-06-15 solves a lot of AutoClassification headaches, and will also cause your dates to sort properly.
5. Normalize Names
Choose the longest form of the name. Similar to above, many people have nicknames and multiple ways they are addressed, both formally and informally. Consider Robert, Rob, Bob, Bobby, etc. Keep your field values clean, consistent and complete by utilizing the person's full name, like this: Smith, Robert J. Then normalize all versions to the proper form.
6. Explain Jargon
Share the love. Make sure to explain and document code-names, trade jargon, inside jokes and other internal references for your Implementation team. Don't assume "everybody knows that" when it comes to AutoClassification. Even when some people can make an educated guess that "Sully" is John R. Sullivan, the software probably can't (or won't consider its confidence levels strong enough).
7. Better Blanks
Blank vs. Null. Try not to use blanks as a legitimate field value. If a data facet is truly not present on a document, use "N/A," "not found," or "None" to indicate a proper, negative (blank) field value, and distinguish between an accidental omission. Similarly, use the word "Null" to indicate no field value attempted. Example: a letter with no Subject or Re line has, correctly, a blank title (Title = None). Given that it is a letter, it has a Null value for Check Number (present only on a Check).
8. Tag It All
Once tagging for one parameter, might as well tag the rest. Consider use cases beyond your own domain area. Focused on retention? Consider data privacy. Aiming for legal hold? Remember lifecycle management. Once you are preparing the effort to configure, crawl and tag content for one purpose, the economics work in your favor to tag for most, if not all of them, concurrently. It's like buying the pasta, sauce and cheese on one smart grocery trip, instead of three simple ones.
9. Manage Legal Hold
Legal Hold as a Content State. With AutoClassification, the Legal Hold process is a question of understanding what should be under hold and for what duration? Legal Hold should not be a permanent status; at best, it is a revolving one for high-profile custodians involved in frequent matters. Instead, tag each file for each & every Legal Hold it is subject to. Then when matters settle, remove the holds accordingly and revert to "normal" retention. Stop using past holds to protect against future needs. Make and maintain lists of current, active Legal Holds, as well as prior ones, refreshing regularly.
10. Use a Taxonomy
Taxonomies reinforce logical thinking. Most organizations use the concept of a taxonomy, whether they know it or not. Any kind of organizational structure counts when it is meant to inform new creators and managers of content, and those seeking to query or retrieve against it. A logical and documented taxonomy goes a long, long way in helping both the humans and the systems in the AutoClassification objective and is well worth the effort.