{"id":22107,"date":"2023-06-19T16:39:01","date_gmt":"2023-06-19T16:39:01","guid":{"rendered":"https:\/\/www.imprima.com\/?p=22107"},"modified":"2023-06-22T08:33:14","modified_gmt":"2023-06-22T08:33:14","slug":"smart-vdrs-it-is-all-about-accuracy-continued","status":"publish","type":"post","link":"https:\/\/www.imprima.com\/blog\/ai-redaction-accuracy","title":{"rendered":"Accuracy of AI-redaction"},"content":{"rendered":"\n<p>In our <a href=\"https:\/\/www.imprima.com\/blog\/automated-redaction-llm\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>previous blog<\/strong> <strong>post<\/strong><\/a> we discussed how advanced AI technology &#8211; driven by Large Language Models &#8211; can be used for automatic redaction of sensitive information. We mentioned that very high accuracy can be achieved with such technology.<\/p>\n\n\n\n<p>In this post we will show that the latter is indeed the case.<\/p>\n\n\n\n<div style=\"height:42px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p><Span style=\"font-weight:800\">Recall<\/span><\/p>\n\n\n\n<p>We already addressed accuracy of AI-driven data extraction and redaction in a blog post in January: <strong><a href=\"https:\/\/www.imprima.com\/blog\/smart-vdrs-accuracy\" target=\"_blank\" rel=\"noreferrer noopener\">Smart VDRs \u2013 It is all about Accuracy<\/a><\/strong>. We discussed how \u201cRecall\u201d is of paramount importance. Without a high recall, there is no point to <strong><a href=\"https:\/\/www.imprima.com\/ai-due-diligence\/smart-redaction\" target=\"_blank\" rel=\"noreferrer noopener\">automated redaction<\/a><\/strong> or <strong><a href=\"https:\/\/www.imprima.com\/ai-due-diligence\/automated-contract-summaries\" target=\"_blank\" rel=\"noreferrer noopener\">key data extraction<\/a><\/strong>: the tools need to be able to find all the terms that need to be redacted\/extracted.<\/p>\n\n\n\n<p>Let\u2019s focus on the redaction use case again. Here are the test results from one of the experiments we used to test the Imprima LLM-driven Smart Redaction technology.<\/p>\n\n\n\n<div style=\"height:42px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img fetchpriority=\"high\" decoding=\"async\" src=\"https:\/\/www.imprima.com\/wp-content\/uploads\/2023\/06\/table-V1Asset-5-1024x737.png\" alt=\"\" class=\"wp-image-22113\" width=\"696\" height=\"500\" srcset=\"https:\/\/www.imprima.com\/wp-content\/uploads\/2023\/06\/table-V1Asset-5-1024x737.png 1024w, https:\/\/www.imprima.com\/wp-content\/uploads\/2023\/06\/table-V1Asset-5-300x216.png 300w, https:\/\/www.imprima.com\/wp-content\/uploads\/2023\/06\/table-V1Asset-5-768x553.png 768w, https:\/\/www.imprima.com\/wp-content\/uploads\/2023\/06\/table-V1Asset-5.png 1500w\" sizes=\"(max-width: 696px) 100vw, 696px\" \/><figcaption class=\"wp-element-caption\">Recall and Precision when redacting PII information in a set of Data Room documents. The test was executed with 5-fold cross validation, and the results are the averages of those 5 tests and are therefore statistically significant.<\/figcaption><\/figure>\n\n\n\n<div style=\"height:42px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p>Let\u2019s have a closer look at this experiment&#8230;<\/p>\n\n\n\n<p>The results show that the recall is very high, with an average recall of 93%. As mentioned previously, for redaction, high recall is the key objective (making sure that the redaction finds as many as possible terms that need to be redacted).<\/p>\n\n\n\n<div style=\"height:42px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p><Span style=\"font-weight:800\">Precision<\/span><\/p>\n\n\n\n<p>However, recall alone is not the only key measure of a model\u2019s accuracy, \u201cprecision\u201d also needs to be considered. Precision is a measure of how many False Positives are generated(or better how few, since high Precision means low False Positives). In other words, how many terms are redacted that should <em>not<\/em> have been redacted. We touched upon this in our January blog post, but did not go into detail. Here we will go a bit deeper.<\/p>\n\n\n\n<p>First of all, note that it is very easy to achieve a very high recall when we allow Precision to be very low. To illustrate that, consider this thought experiment:<\/p>\n\n\n\n<p><em><strong>If we redact everything in a document, every single word and number, the recall is guaranteed to be 100%, right? But the result is also useless, obviously. You could just as well have deleted the document.<\/strong><\/em><\/p>\n\n\n\n<p>That is an extreme case of course but, even with the most accurate AI tech, perfect recall <em>and<\/em> perfect precision cannot be achieved at the same time.<\/p>\n\n\n\n<p>With our LLM-driven <strong><a href=\"https:\/\/www.imprima.com\/ai-due-diligence\/smart-redaction\" target=\"_blank\" rel=\"noreferrer noopener\">redaction tool<\/a><\/strong>, we aim to optimise recall, while still achieving a high level of Precision.<\/p>\n\n\n\n<p>As a result, we typically get a recall of well over 90%, with a lower precision, but still of around 80-90% (in the example above it is on the higher end of that range).<\/p>\n\n\n\n<div style=\"height:42px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p><Span style=\"font-weight:800\">How to deal with that In Practice<\/span><\/p>\n\n\n\n<p>So, what does this mean for you in practice? Let\u2019s take the lower end of the range, 80% precision: this would mean that of all redacted terms, 20% should not have been redacted. Sound like a lot? Well, it is not really. Suppose 5% of the terms (words, numbers, etc.) in a document need to be redacted (and in practice it is probably much less for most docs), it means that only about 1% of the words in the document are redacted that should not have been redacted*. That will hardly make the document unreadable.<\/p>\n\n\n\n<p>That said, even those erroneously redacted words can be easily removed, with the right tools. More about that in our blog post next week. <\/p>\n\n\n\n<p>Stay tuned&#8230;<\/p>\n\n\n\n<div style=\"height:42px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p><Span style=\"font-weight:800\">Conclusion<\/span><\/p>\n\n\n\n<p>AI redaction can save you an enormous amount of time and avoid human error. But it has to be AI redaction that really works: AI redaction that is accurate. As discussed in our blog post of 2 weeks ago, traditional automation techniques don\u2019t work. And as discussed in last week\u2019s blog post, the only way to achieve high accuracy is via AI based on Large Language models.<\/p>\n\n\n\n<div style=\"height:42px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p><span style=\"font-weight:500\">Are you looking for a VDR with fully integrated redaction software which leverages AI? Speak to our <a href=\"https:\/\/www.imprima.com\/contact\/sales\/?productfeature=SMART%20REDACTION%20LLM\" target=\"_blank\" rel=\"noreferrer noopener\">sales team<\/a> or check out our <a href=\"https:\/\/www.imprima.com\/virtual-data-room\/smart-redaction\" target=\"_blank\" rel=\"noreferrer noopener\">Smart Redaction page here<\/a>.<\/span><\/p>\n\n\n\n<div style=\"height:42px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<br><p>* 5% x 20%\/80% = 1.25%<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In our previous blog post we discussed how advanced AI technology &#8211; driven by Large Language Models &#8211; can be used for automatic redaction of sensitive information. We mentioned that very high accuracy can be achieved with such technology. In this post we will show that the latter is indeed the case. Recall We already [&hellip;]<\/p>\n","protected":false},"author":8,"featured_media":22110,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[169,159,171],"tags":[],"class_list":["post-22107","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-artificial-intelligence","category-general","category-vdr"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v26.4 (Yoast SEO v26.4) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Accuracy of AI-redaction - Imprima<\/title>\n<meta name=\"description\" content=\"Learn how AI automatic redaction (that really works) can save you an enormous amount of time and avoid human error.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.imprima.com\/blog\/ai-redaction-accuracy\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Accuracy of AI-redaction\" \/>\n<meta property=\"og:description\" content=\"Learn how AI automatic redaction (that really works) can save you an enormous amount of time and avoid human error.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.imprima.com\/blog\/ai-redaction-accuracy\" \/>\n<meta property=\"og:site_name\" content=\"Imprima\" \/>\n<meta property=\"article:published_time\" content=\"2023-06-19T16:39:01+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-06-22T08:33:14+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.imprima.com\/wp-content\/uploads\/2023\/06\/blog-graphics-accuracy-v3a.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"600\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Marcus Tolan\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Marcus Tolan\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.imprima.com\/blog\/ai-redaction-accuracy#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.imprima.com\/blog\/ai-redaction-accuracy\"},\"author\":{\"name\":\"Marcus Tolan\",\"@id\":\"https:\/\/www.imprima.com\/#\/schema\/person\/91ac413cefc7cd6c4ec7034f7cef47f2\"},\"headline\":\"Accuracy of AI-redaction\",\"datePublished\":\"2023-06-19T16:39:01+00:00\",\"dateModified\":\"2023-06-22T08:33:14+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.imprima.com\/blog\/ai-redaction-accuracy\"},\"wordCount\":678,\"publisher\":{\"@id\":\"https:\/\/www.imprima.com\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.imprima.com\/blog\/ai-redaction-accuracy#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.imprima.com\/wp-content\/uploads\/2023\/06\/blog-graphics-accuracy-v3a.jpg\",\"articleSection\":[\"Artificial Intelligence\",\"General\",\"VDR\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.imprima.com\/blog\/ai-redaction-accuracy\",\"url\":\"https:\/\/www.imprima.com\/blog\/ai-redaction-accuracy\",\"name\":\"Accuracy of AI-redaction - Imprima\",\"isPartOf\":{\"@id\":\"https:\/\/www.imprima.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.imprima.com\/blog\/ai-redaction-accuracy#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.imprima.com\/blog\/ai-redaction-accuracy#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.imprima.com\/wp-content\/uploads\/2023\/06\/blog-graphics-accuracy-v3a.jpg\",\"datePublished\":\"2023-06-19T16:39:01+00:00\",\"dateModified\":\"2023-06-22T08:33:14+00:00\",\"description\":\"Learn how AI automatic redaction (that really works) can save you an enormous amount of time and avoid human error.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.imprima.com\/blog\/ai-redaction-accuracy#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.imprima.com\/blog\/ai-redaction-accuracy\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.imprima.com\/blog\/ai-redaction-accuracy#primaryimage\",\"url\":\"https:\/\/www.imprima.com\/wp-content\/uploads\/2023\/06\/blog-graphics-accuracy-v3a.jpg\",\"contentUrl\":\"https:\/\/www.imprima.com\/wp-content\/uploads\/2023\/06\/blog-graphics-accuracy-v3a.jpg\",\"width\":1200,\"height\":600,\"caption\":\"Redaction Accuracy Graphics\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.imprima.com\/blog\/ai-redaction-accuracy#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.imprima.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Accuracy of AI-redaction\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.imprima.com\/#website\",\"url\":\"https:\/\/www.imprima.com\/\",\"name\":\"Imprima\",\"description\":\"Secure Online Data Room Services by Imprima\",\"publisher\":{\"@id\":\"https:\/\/www.imprima.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.imprima.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.imprima.com\/#organization\",\"name\":\"Imprima\",\"url\":\"https:\/\/www.imprima.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.imprima.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.imprima.com\/wp-content\/uploads\/2021\/05\/imprima-logo-new.svg\",\"contentUrl\":\"https:\/\/www.imprima.com\/wp-content\/uploads\/2021\/05\/imprima-logo-new.svg\",\"width\":507.43,\"height\":149.18,\"caption\":\"Imprima\"},\"image\":{\"@id\":\"https:\/\/www.imprima.com\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.imprima.com\/#\/schema\/person\/91ac413cefc7cd6c4ec7034f7cef47f2\",\"name\":\"Marcus Tolan\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Accuracy of AI-redaction - Imprima","description":"Learn how AI automatic redaction (that really works) can save you an enormous amount of time and avoid human error.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.imprima.com\/blog\/ai-redaction-accuracy","og_locale":"en_US","og_type":"article","og_title":"Accuracy of AI-redaction","og_description":"Learn how AI automatic redaction (that really works) can save you an enormous amount of time and avoid human error.","og_url":"https:\/\/www.imprima.com\/blog\/ai-redaction-accuracy","og_site_name":"Imprima","article_published_time":"2023-06-19T16:39:01+00:00","article_modified_time":"2023-06-22T08:33:14+00:00","og_image":[{"width":1200,"height":600,"url":"https:\/\/www.imprima.com\/wp-content\/uploads\/2023\/06\/blog-graphics-accuracy-v3a.jpg","type":"image\/jpeg"}],"author":"Marcus Tolan","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Marcus Tolan","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.imprima.com\/blog\/ai-redaction-accuracy#article","isPartOf":{"@id":"https:\/\/www.imprima.com\/blog\/ai-redaction-accuracy"},"author":{"name":"Marcus Tolan","@id":"https:\/\/www.imprima.com\/#\/schema\/person\/91ac413cefc7cd6c4ec7034f7cef47f2"},"headline":"Accuracy of AI-redaction","datePublished":"2023-06-19T16:39:01+00:00","dateModified":"2023-06-22T08:33:14+00:00","mainEntityOfPage":{"@id":"https:\/\/www.imprima.com\/blog\/ai-redaction-accuracy"},"wordCount":678,"publisher":{"@id":"https:\/\/www.imprima.com\/#organization"},"image":{"@id":"https:\/\/www.imprima.com\/blog\/ai-redaction-accuracy#primaryimage"},"thumbnailUrl":"https:\/\/www.imprima.com\/wp-content\/uploads\/2023\/06\/blog-graphics-accuracy-v3a.jpg","articleSection":["Artificial Intelligence","General","VDR"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.imprima.com\/blog\/ai-redaction-accuracy","url":"https:\/\/www.imprima.com\/blog\/ai-redaction-accuracy","name":"Accuracy of AI-redaction - Imprima","isPartOf":{"@id":"https:\/\/www.imprima.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.imprima.com\/blog\/ai-redaction-accuracy#primaryimage"},"image":{"@id":"https:\/\/www.imprima.com\/blog\/ai-redaction-accuracy#primaryimage"},"thumbnailUrl":"https:\/\/www.imprima.com\/wp-content\/uploads\/2023\/06\/blog-graphics-accuracy-v3a.jpg","datePublished":"2023-06-19T16:39:01+00:00","dateModified":"2023-06-22T08:33:14+00:00","description":"Learn how AI automatic redaction (that really works) can save you an enormous amount of time and avoid human error.","breadcrumb":{"@id":"https:\/\/www.imprima.com\/blog\/ai-redaction-accuracy#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.imprima.com\/blog\/ai-redaction-accuracy"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.imprima.com\/blog\/ai-redaction-accuracy#primaryimage","url":"https:\/\/www.imprima.com\/wp-content\/uploads\/2023\/06\/blog-graphics-accuracy-v3a.jpg","contentUrl":"https:\/\/www.imprima.com\/wp-content\/uploads\/2023\/06\/blog-graphics-accuracy-v3a.jpg","width":1200,"height":600,"caption":"Redaction Accuracy Graphics"},{"@type":"BreadcrumbList","@id":"https:\/\/www.imprima.com\/blog\/ai-redaction-accuracy#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.imprima.com\/"},{"@type":"ListItem","position":2,"name":"Accuracy of AI-redaction"}]},{"@type":"WebSite","@id":"https:\/\/www.imprima.com\/#website","url":"https:\/\/www.imprima.com\/","name":"Imprima","description":"Secure Online Data Room Services by Imprima","publisher":{"@id":"https:\/\/www.imprima.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.imprima.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.imprima.com\/#organization","name":"Imprima","url":"https:\/\/www.imprima.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.imprima.com\/#\/schema\/logo\/image\/","url":"https:\/\/www.imprima.com\/wp-content\/uploads\/2021\/05\/imprima-logo-new.svg","contentUrl":"https:\/\/www.imprima.com\/wp-content\/uploads\/2021\/05\/imprima-logo-new.svg","width":507.43,"height":149.18,"caption":"Imprima"},"image":{"@id":"https:\/\/www.imprima.com\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/www.imprima.com\/#\/schema\/person\/91ac413cefc7cd6c4ec7034f7cef47f2","name":"Marcus Tolan"}]}},"_links":{"self":[{"href":"https:\/\/www.imprima.com\/wp-json\/wp\/v2\/posts\/22107","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.imprima.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.imprima.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.imprima.com\/wp-json\/wp\/v2\/users\/8"}],"replies":[{"embeddable":true,"href":"https:\/\/www.imprima.com\/wp-json\/wp\/v2\/comments?post=22107"}],"version-history":[{"count":0,"href":"https:\/\/www.imprima.com\/wp-json\/wp\/v2\/posts\/22107\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.imprima.com\/wp-json\/wp\/v2\/media\/22110"}],"wp:attachment":[{"href":"https:\/\/www.imprima.com\/wp-json\/wp\/v2\/media?parent=22107"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.imprima.com\/wp-json\/wp\/v2\/categories?post=22107"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.imprima.com\/wp-json\/wp\/v2\/tags?post=22107"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}