6 results found
-
Export OCR PDF file >> keep OCR
If I, after running OCR on a PDF, export the pdf to my computer, the text is unrecognized and I can't search it anymore using for example Acrobat. It would be nice and usefull if a PDF that was text recognised, stays that way after export.
1 vote -
Provide further options for configuring private OCR endpoints
Currently, it is only possible to authenticate to the OCR endpoint using an API key. There are other (more secure) authentication methods available to be configured on these resources within Azure. It would be beneficial if DataSnipper would support changing the authentication method that DataSnipper uses
1 vote -
Dash to Zero Conversion
Is it feasible to implement an automatic dash-to-zero conversion for numerical data extracted from tables? By identifying tables with primarily numerical values, the system could infer that dashes represent zeros and replace them accordingly. This feature would enhance user experience, especially when dealing with large datasets
1 vote -
OCR Health Score
Implement a confidence scoring system to provide users with insights into the accuracy of OCR-extracted text. I envision this feature offering both document-level and word/number-level confidence scores, empowering users to evaluate the reliability of the extracted data and make informed decisions.
Granular Confidence Levels: Clearly define confidence score ranges (e.g., high, medium, low) and provide corresponding probability values for better interpretability.
Visual Indicators: Incorporate visual cues (e.g., color-coding, icons) to quickly convey confidence levels, enhancing user experience.1 vote -
Run OCR only for the portions of that document which do not contain computer generated text
OCR currently overwrites all computer generated text, sometimes with worse results. OCR may be required if a portion of the document contains computer generated text but a portion does not. In these instances it would be ideal to only run use the OCR to recognize the text that is not yet included in the text layer of the file.
1 vote -
OCR - Punctuation standardisation - apostrophes & quote marks
OCR - Punctuation standardisation - apostrophes & quote marks
When running OCR on documents depending on font some punctuation, particularly apostrophes & quote marks, come out differently e.g. Group's vs. Group’s.
This causes issues when using the financial statement suite & version compare features in particular as a lot false changes get flagged which muddy the waters significantly. If logic can be built in so the characters used for apostrophes & quote marks are consistent across the board that would be ideal, I think this setting would be best as the default and having an option to switch back to…
1 voteThis has been deferred (not planned for the next 6 months).
Please continue to share this idea, we will continue to monitor for votes and comments!
- Don't see your idea?