Saturday, 24 July 2021
  473 Visits
0
Votes
Undo
  Subscribe

If the language of your S-99 is in Latin script and does not use (a lot of) accents or other diacritical marks, you can probably copy text from the S-99 and paste into the talk import wizard of TheocBase. The ‘if’ means that for quite a number of languages the import wizard will not work smoothly or at all.

It has been suggested by Marlon B here to use OCR software to go around that issue. It is rather simple to use, but doesn't look that way, so this may put off some users. Follow these steps and you’d probably save a lot of time.

Go to https://github.com/manisandro/gImageReader, ignore everything you see, scroll down to the section “Installation” and click the link for your operating System. (Alternatively use https://sourceforge.net/projects/gimagereader/)

Find the latest release toward the top, and scroll down just enough to see a couple of files that you can download. For Windows choose the file that ends in x86_64.exe, download and install. (if you get a message that this program is meant for 64-bit computers, go back, download and install the i686.exe).

The standard installation only installs the English interface, if you choose ‘Standard (localised)’ it will adopt the language of your operating system, if available. Your choice here is inconsequential, and has nothing to do with the language you are scanning.

Open the program. In the upper right hand corner you see a tool icon. Click it and choose ‘manage languages’. Tick the box(es) to the language(s) that you want to extract from your S-99 talk list, click Apply, when done click Close.

To the far left, under ‘Sources’/Files, click the Add Image button and select the S-99 pdf. You’ll get a message that the pdf already contains text. Hit OK. You will only see the first page and cannot scroll. Don’t worry.

In the top bar, a few buttons now become clickable. Under OCR mode, choose ‘plain text’. The button just right of that has a little arrow next to it. Click the arrow (not the button) and specify the language to be recognised. If it is a language you have just added, you will be warned about a dictionary not being installed. Click Instal, wait until done, click OK.

Select the talk themes as follows:

With your cursor start in the upper left hand corner just above talk# 1 and take some extra space left of the number column (so it wil "see" numbers larger than 99). Draw all the way down to just under the last talk on that page, but above the form number. Toward the right the selection should just not hit the date grid. If you select more than only numbers and themes, you get a lot of clutter that you have to remove later.

Because you selected an area, the ‘recognise all’ button has changed to ‘recognise selection’. Make sure it is scanning for the correct language, and click. NOW you will be asked about the number of pages. Click Multiple Pages, it will preselect all pages already, leave everything as proposed and click OK. (It is the same area on each page, that is why this works)

Scanning will start and the output will appear in a pane toward the right. Select all, copy and temporarily paste into a document, where you can more easily check for errors. (Not all digits were recognised in my case). Do the necessary cleanup and corrections, then paste into TheocBase’s talk import wizard.


https://www.theocbase.net/support-forum/post/1097-donations.html

For accessing the database my personal preference is http://sqlitebrowser.org/

For editing templates I now use https://code.visualstudio.com/ 

 


Marc locked this post — 2 months ago
Marc featured this post — 2 months ago
There are no replies made for this post yet.
Be one of the first to reply to this post!
Sorry, the discussion is currently locked. You will not be able to post a reply or a comment at the moment.