CS6825: Computer Vision word cloud

Project 3:Final Project

             (245 points)

                             Research due 11pm Oct. 28(try to leverage your Project 1) (15 points)
                           
                            
Proposal 11pm, Nov. 1 (30 points) - MUST COME TO OFFICE HOURS on Nov 2 - SCHEDULE APPOINTMENT

                             Project Due midnight Nov. 29 demos on Nov. 30 (200 points)Guidelines (guidelines only)

                        
                             

 

 

 

Deliverables (see top for due dates)

     1) See PROJECT DETAILS for requirements

     2) Research/class discussion - POST on blackboard->DiscussionBoard->Project 3 - Research Postings

    • Post a minimum of 3 articles related to your topic (MUST be good quality articles with Detailed algorithms and results
    • For each article you must provide:
      • Author and Title of paper
      • URL to online version of paper
      • Synopsis: 2-3 paragraphs telling what this paper is about
      • Use: 1-2 paragraphs about how you might use this research or parts of it in your Project 3.

   3) Project Proposal -

Turn in Project 3 Proposal as a word document at Blackboard->Discussion Board->Project 3 Proposal  BE AS TECHNICAL AS YOU CAN

  Section 1: GOAL STATEMENT   Start of with describing the problem to me-- Like Reading Label on Flat Box Packages for low vision people

    Section 1.1 INPUT show some typical input images (video) for the system.  Tell me what conditions will it operate under / what are the constraints:  good lighting, package flat and within +/- 10 degrees facing flat to camera, box not rotated more than +/-10 degrees, reading text with high contrast to background either  dark text on lighter/white background  or light text on dark background (show image exmaples) ,  etc.....

    Section 1.2 OUTPUT  tell me the output is : say in large font text of the label that user can scroll through it to read, OR output text to speech.

  Section 2: Development Platform and Target AVD and Device you will test it on. 

    example, will target API 19, Android 4.4.2, using OpenCV version 2.4.10 in AndroidStudio and will test on Samsung Galaxy XXXX

  Section 3: ALGORITHMS:   discuss any alogrithms and the references you used to understand them and any source from OpenCV or other parties you might use

    >>>>> Here is a partial example (I have not written enough but, you get the idea of the content. You will have Section 3.1 Overview and then you will have sections 3.2-3.* depending on how many components/ steps are in your proposed system)

     Section 3.1: OVERVIEW:  I will have the following system components:  Label Area Finding,  OCR in each Label Area,  Output Results to User.  The main addition I am coming up with as added value is the Label Area Finding and the OCR will be done using already existing code.  Ofcourse the integration into an app is also important.

    example Section 3.2: Label Area Finding:    I am going to take a picture of the box and find potential Label Areas.   I am going to do this using the following unique idea that I came up with:

    Do Color Blob Detection using XYZ algorithm see https://github.com/Itseez/opencv/tree/master/samples/android/color-blob-detection for reference

    Then I am going to select the top 5 colors present based on their area (histogram).  I will have to decide how "close enough in color to each of the 5 colors a pixel can be to be counted in the area for that color".  This will be done experimentally. 

    For each of the 5 top colors, I will create a Label Area (subimage of the original image) I am going to then create a sub-image that is ideally smaller than the entire original image such that it is the rectangle to encompass all of the pixels of that color.

    I am going to pass the 5 top Label Areas for processing to Section 3.3

    After looking at the results for typical input images, I may adjust and choose a smaller or larger number than 5 (this will be a parameter setting in my app called Detect_Number_Label_Areas)

    example Secton 3.3:  OCR in a Label Area:  For each Label area found from previous component, I perform OCR.  This is done using the TEsseract OCR Engine (maintained by Google),  see http://gaut.am/making-an-ocr-android-app-using-tesseract/ and https://github.com/rmtheis/android-ocr  and https://play.google.com/store/apps/details?id=edu.sfsu.cs.orange.ocr&hl=en  

    example Section 3.4:  Reporting Results:  I will present the user with both a blown up text version of the label in black text on white background and text to speech.  The first part is using standard android GUI elements of a TextView contained in a ScrollView.   The second part, text to speech, will be done using standard Android TextToSpeech class see http://developer.android.com/reference/android/speech/tts/TextToSpeech.html

  Section 4: GUI interface       

    SHOW ALL interfaces and interactions to change interface by ---either draw by hand AND scan into your word document or use free mock interface tool   try out http://balsamiq.com/ 



   4) Project Due - Turn in to BB->DiscussionBoard->Project 3-Results

    1) zipped up (using zip standard only) file containing all code (in AndroidStudio environment).
    2) PDF Document

    PDF from WORD Document sample fround here for template you MUST use) that details the following:
    • Section 1 Execution Instructions:
      Instructions for me to download and run your code. YOU NEED to show me screen shots of you doing this from your uploaded blackboard code.....this forces you to make sure that I can run your code. You MUST have the following screenshots AND give description on what to do: screenshot 1.1 = screen shot of your files uploaded to Project 1 turn in folder on blackboard

      FIGURE HERE
      screenshot 1.2 = directory view of "temp" directory you unzipped file to showing the unziped files and directory structures.


      FIGURE HERE
      screenshot 1.3 = Android Studio running where you have opened up project file in "temp" directory.

      FIGURE HERE
      screenshot 1.4 = running the application - show screenshot of it running. If I must do something beyond simply hitting the "run" button, you need to give screenshots and step by step instructions.

       

    • Section 2 Code Descpription
       A describing how code is structured and the state of how it works. Give a describption for each filename listed.
    •  

    • Section 3 Testing:
       here you give screen shots of you running the various stages of the program as detailed here:
    • section 3.1: starting application -


      FIGURE HERE
      screenshot 3.1a= showing icon and resulting starting GUI

       

      section 3.2: use step 1

        You will have sections showing you using different interfaces and results of the application.
      FIGURE HERE
      screenshot 3.2a = screen shot of active image in your application

      FIGURE HERE
      screenshot 3.2b = screen shot showing the results of application running on this image(s)

    • Section 4 Comments
      Optional any comments you have regarding your code (necessary if you code is not working, you need to tell me in detail what the problem is or what is missing)

    • Section 5 YOUTUBE URL - URL to YouTube video: and it must go over a LIVE demonstration of the program working and describe what is working, what is not working, and how well it works (accuracy --e.g. 2/10 times, never, 9/10 times whatever)


© Lynne Grewe