Core: OCR (text recognition) (10 ideas) – Have an idea? Share it!

How can we improve our Core features ...

Enter your idea

(thinking…)

Enter your idea and we'll search to see if someone has already suggested it.

If a similar idea already exists, you can support and comment on it.

If it doesn't exist, you can post your idea so others can support it.

Enter your idea and we'll search to see if someone has already suggested it.

Enhance OCR accuracy to distinguish currency symbol and number

Currently, in Document Matching, Form Extraction, etc., a currency symbol such as "￥" is recognized as a number"1" or "7" when extracting the amount of money, especially if it is handwritten.
Ideally, improve the accuracy of OCR to distinguish between currency symbol and number.

18 votes

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

Under Review · 2 comments · OCR (text recognition) · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Date Recognition - dd/mm/yyyy or mm/dd/yyyy

Regarding date recognition, the date may be recognised as dd/mm/yyyy or mm/dd/yyyy. It is particularly complicated to determine whether the date is mm/dd/yyyyy or dd/mm/yyyyy if the date is earlier than 13 days.
For example, if 12/4/2025 is snipped on a cell, it is not possible to determine whether it is 12 April or 4 December, so it is necessary to check the source document, which is a double effort. It would be desirable to be able to snip the date in a consistent manner to a specific format.

4 votes

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

New · 0 comments · OCR (text recognition) · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Recognition of alternative decimal / a thousand separators

There a many types of documents which use alternative decimal / a thousand separators such as in tax forms or payroll registers. These forms or system print-outs often use pre-set bars or grids to separates figures. It would be extremely helpful if DataSnipper could detect the logic behind these formats and extract information correctly.

5 votes

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

Gathering Feedback · 0 comments · OCR (text recognition) · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
OCR Health Score

Implement a confidence scoring system to provide users with insights into the accuracy of OCR-extracted text. I envision this feature offering both document-level and word/number-level confidence scores, empowering users to evaluate the reliability of the extracted data and make informed decisions.

Granular Confidence Levels: Clearly define confidence score ranges (e.g., high, medium, low) and provide corresponding probability values for better interpretability.
Visual Indicators: Incorporate visual cues (e.g., color-coding, icons) to quickly convey confidence levels, enhancing user experience.

6 votes

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

Under Review · 0 comments · OCR (text recognition) · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Export OCR PDF file >> keep OCR

If I, after running OCR on a PDF, export the pdf to my computer, the text is unrecognized and I can't search it anymore using for example Acrobat. It would be nice and usefull if a PDF that was text recognised, stays that way after export.

2 votes

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

Under Review · 0 comments · OCR (text recognition) · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Provide further options for configuring private OCR endpoints

Currently, it is only possible to authenticate to the OCR endpoint using an API key. There are other (more secure) authentication methods available to be configured on these resources within Azure. It would be beneficial if DataSnipper would support changing the authentication method that DataSnipper uses

2 votes

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

0 comments · OCR (text recognition) · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

Deferred · AdminJustin (Admin, DataSnipper) responded

This has been deferred (not planned for the next 6 months).
Please continue to share this idea, we will continue to monitor for votes and comments!
Run OCR only for the portions of that document which do not contain computer generated text

OCR currently overwrites all computer generated text, sometimes with worse results. OCR may be required if a portion of the document contains computer generated text but a portion does not. In these instances it would be ideal to only run use the OCR to recognize the text that is not yet included in the text layer of the file.

4 votes

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

Under Review · 0 comments · OCR (text recognition) · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Dash to Zero Conversion

Is it feasible to implement an automatic dash-to-zero conversion for numerical data extracted from tables? By identifying tables with primarily numerical values, the system could infer that dashes represent zeros and replace them accordingly. This feature would enhance user experience, especially when dealing with large datasets

2 votes

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

Deferred · 0 comments · OCR (text recognition) · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
OCR - Punctuation standardisation - apostrophes & quote marks

OCR - Punctuation standardisation - apostrophes & quote marks

When running OCR on documents depending on font some punctuation, particularly apostrophes & quote marks, come out differently e.g. Group's vs. Group’s.

This causes issues when using the financial statement suite & version compare features in particular as a lot false changes get flagged which muddy the waters significantly. If logic can be built in so the characters used for apostrophes & quote marks are consistent across the board that would be ideal, I think this setting would be best as the default and having an option to switch back to the more granular character specific version of things if the need arises for people.

OCR - Punctuation standardisation - apostrophes & quote marks

When running OCR on documents depending on font some punctuation, particularly apostrophes & quote marks, come out differently e.g. Group's vs. Group’s.

This causes issues when using the financial statement suite & version compare features in particular as a lot false changes get flagged which muddy the waters significantly. If logic can be built in so the characters used for apostrophes & quote marks are consistent across the board that would be ideal, I think this setting would be best as the default and having an option to switch back to…

2 votes

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

0 comments · OCR (text recognition) · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

Deferred · AdminJustin (Admin, DataSnipper) responded

This has been deferred (not planned for the next 6 months).
Please continue to share this idea, we will continue to monitor for votes and comments!
Optimize PDF

Create a button with alert when a PDF document is uploaded to run or save your document as Optimized to eliminate the meta data on a PDF document that can cause issues. Similar to how you run OCR by selecting button

3 votes

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

0 comments · OCR (text recognition) · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

Deferred · AdminJustin (Admin, DataSnipper) responded

This has been deferred (not planned for the next 6 months).
Please continue to share this idea, we will continue to monitor for votes and comments!

Don't see your idea?

Core

Feedback

Core

How can we improve our Core features ...

Enhance OCR accuracy to distinguish currency symbol and number

Your importance score has been recorded.

Date Recognition - dd/mm/yyyy or mm/dd/yyyy

Your importance score has been recorded.

Recognition of alternative decimal / a thousand separators

Your importance score has been recorded.

OCR Health Score

Your importance score has been recorded.

Export OCR PDF file >> keep OCR

Your importance score has been recorded.

Provide further options for configuring private OCR endpoints

Your importance score has been recorded.

Run OCR only for the portions of that document which do not contain computer generated text

Your importance score has been recorded.

Dash to Zero Conversion

Your importance score has been recorded.

OCR - Punctuation standardisation - apostrophes & quote marks

Your importance score has been recorded.

Optimize PDF

Your importance score has been recorded.

Core

Categories

How can we improve our Core features ...

We're glad you're here

We're glad you're here

Your importance score has been recorded.

We're glad you're here

We're glad you're here

Your importance score has been recorded.

We're glad you're here

We're glad you're here

Your importance score has been recorded.

We're glad you're here

We're glad you're here

Your importance score has been recorded.

We're glad you're here

We're glad you're here

Your importance score has been recorded.

We're glad you're here

We're glad you're here

Your importance score has been recorded.

We're glad you're here

We're glad you're here

Your importance score has been recorded.

We're glad you're here

We're glad you're here

Your importance score has been recorded.

We're glad you're here

We're glad you're here

Your importance score has been recorded.

We're glad you're here

We're glad you're here

Your importance score has been recorded.

We're glad you're here