Navigation

    OpenIAP

    • Login
    • Search
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups

    ReadPDF Duplicating Information

    General Discussion
    ocr open app pdf extract
    4
    15
    156
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • F
      fpssouz last edited by fpssouz

      Hi Allan.

      After several unsuccessful attempts I decided to share the problem below:

      OpenRPA.Utilities -> ReadPDF
      Repeating letters and/or complete sentences from previous pages.
      I will make some visual markings for your understanding.

      Note: for this example I used a simple stream that copies the result to clipboard.
      I tried other means of output (csv, dataTable, excel), and the results had the same problems.

      d1cb7f89-43a5-4cde-bb12-85ec79492544-image.png

      5e7f94fb-83d4-46bb-9af5-7dc862972f76-image.png

      Link to pdf document used in the example:
      download pdf example

      1 Reply Last reply Reply Quote 0
      • F
        fpssouz last edited by fpssouz

        This post is deleted!
        1 Reply Last reply Reply Quote 0
        • H
          Hammer last edited by Hammer

          I also have this same problem 😞

          Allan Zimmermann 1 Reply Last reply Reply Quote 0
          • Allan Zimmermann
            Allan Zimmermann @Hammer last edited by

            I have not been super active the last few weeks.
            I'm pushed extremely hard on getting version 1.3 of OpenFlow finished, while also having to supporting my family, due to my my mother getting very sick. Once that is done, I will return some love to OpenRPA and forum/rocketchat again.
            You are not forgotten or ignored, I'm just not able to answer as fast as I normally do. Sorry about that.

            F H 2 Replies Last reply Reply Quote 0
            • F
              fpssouz @Allan Zimmermann last edited by

              Allan, I'm sorry I didn't intend to prey on you, because your support and attention were beyond my expectations.
              I'm sorry for her mother, and I wish her well.

              1 Reply Last reply Reply Quote 1
              • H
                Hammer @Allan Zimmermann last edited by

                @allan-zimmermann
                Allan,
                Did you get to look at the OCR reader problem?

                Allan Zimmermann 1 Reply Last reply Reply Quote 0
                • Allan Zimmermann
                  Allan Zimmermann @Hammer last edited by

                  I testet 6-7 different nuget packages and they all had issue with your pdf ... pdf's are not text, but vector graphics, so it may not always be possible to extract it as "real" text ...
                  But i finall found one that seemed to work ..
                  If you search for TikaOnDotNet and the install
                  TikaOnDotNet.textExtractor
                  Then close the robot and restart it ( for some reason it keeps getting stuck on installing .. need to look at that later ) it should then install TikaOnDotNet and al dependecies
                  And an string variable "text" and add an Invoke Code, and set it to C# and add this code

                  Console.WriteLine("init");
                  var textExtractor = new TikaOnDotNet.TextExtraction.TextExtractor();
                  Console.WriteLine("Extract");
                  var result = textExtractor.Extract(@"C:\Users\Allan\Downloads\test_reading_pdf.pdf");
                  Console.WriteLine("save resukt");
                  text += result;
                  

                  then modify the filepath to match yours ...

                  Flávio Pinheiro de Souza 2 Replies Last reply Reply Quote 0
                  • Flávio Pinheiro de Souza
                    Flávio Pinheiro de Souza @Allan Zimmermann last edited by

                    @allan-zimmermann I will install, and then post the result.
                    Thank you very much

                    1 Reply Last reply Reply Quote 0
                    • Flávio Pinheiro de Souza
                      Flávio Pinheiro de Souza @Allan Zimmermann last edited by Flávio Pinheiro de Souza

                      @allan-zimmermann
                      Allan, here's a summary of the tests.
                      In visual studio it worked well (see screenshots)

                      688cfcc0-c4c4-4ba8-9e5d-3e99143a1690-image.png

                      However, in OpenRPA, the errors described below occur:
                      cd5ac56b-2f90-4006-8124-94c8f6aa4bb1-image.png

                      LogError:
                      daef02e2-8421-4c50-a6c0-230c5a1ee628-image.png

                      Thanks for your help.

                      Allan Zimmermann 1 Reply Last reply Reply Quote 0
                      • Allan Zimmermann
                        Allan Zimmermann @Flávio Pinheiro de Souza last edited by Allan Zimmermann

                        It's not a class or class file, Invoke Code is a function. You cannot use using statements inside a function.
                        Remove the using part ( so line 1 )

                        Flávio Pinheiro de Souza 1 Reply Last reply Reply Quote 0
                        • Flávio Pinheiro de Souza
                          Flávio Pinheiro de Souza @Allan Zimmermann last edited by Flávio Pinheiro de Souza

                          @allan-zimmermann

                          The first time I ran I did as you instructed, but I got an error "namespace name or type 'TikaOnDotNet' cannot be found'.
                          Note: the TikaOnDotNet package is already installed and works fine in visual studio as I showed in the previous prints.

                          6896f07f-3f75-4ff4-b74e-793ab7cc68f5-image.png

                          e76ccfd4-b003-4a1f-b875-1eb4dea40f4d-image.png

                          Allan Zimmermann 1 Reply Last reply Reply Quote 0
                          • Allan Zimmermann
                            Allan Zimmermann @Flávio Pinheiro de Souza last edited by

                            Did you install TikaOnDotNet using the package manager on the project?

                            Flávio Pinheiro de Souza 1 Reply Last reply Reply Quote 0
                            • Flávio Pinheiro de Souza
                              Flávio Pinheiro de Souza @Allan Zimmermann last edited by

                              @allan-zimmermann No. I installed it from the visual studio manager 😞
                              Can you guide me in installing this package?
                              Obs.: I'm running in docker.

                              Allan Zimmermann 1 Reply Last reply Reply Quote 0
                              • Allan Zimmermann
                                Allan Zimmermann @Flávio Pinheiro de Souza last edited by

                                Go to "Open Project" -> Select a project -> Click the "Open Package Manager"
                                Select "Nuget.org" and type "tika" and select "TikeOnDitNet" and click "install" when you see the version number in the dropdown list.
                                What do you mean, "run in docker" you cannot run openrpa in docker ?

                                Flávio Pinheiro de Souza 1 Reply Last reply Reply Quote 0
                                • Flávio Pinheiro de Souza
                                  Flávio Pinheiro de Souza @Allan Zimmermann last edited by

                                  @allan-zimmermann
                                  I want to thank you for your help and the speed with which you always respond. Thanks a lot again, everything is working.

                                  1 Reply Last reply Reply Quote 1
                                  • First post
                                    Last post