CS6825: Computer Vision

word cloud

Project 3:Final Project

(230 points)

                             Research midnight, May 4 (try to leverage your Project 1)


                             Proposal midnight, May 10

                             Project Due before start of class, June 1

Extra Credit Due before start of class, June 1

Evaluation Guidelines (To be determined the process)

This is a unique project in that you will choose to do ONLY ONE of two options and this is part of the proposal that is due. Once you have selected the option you will do during the proposal phase you will NOT be able to change your option ----given the time and this being the last project - it will be unlikely you could finish if you try to make a change at that point even if I allowed it. We have a limited number of Kinect devices and so if there are more people wanting this option, we will select randomly. If you have your own Kinect (and have or a willing to purchase a separate power cord/usb cord --around $10) then you get to choose Kinect if you want. If you take a Kinect with you, you must promise to return it during our meeting towards the end of the quarter as designated by instructor. It must be returned in good condition.

Option 1: Kinect Cyber Physical System	Option 2: Mobile Imaging Application on Android
Extra Credit OPTION 1 ONLY - real world deployment and experimentation with user base (beta testing)	Extra Credit OPTION 2 ONLY - real world deployment and experimentation with user base (beta testing)

Deliverables (see top for due dates)

1) See EACH option for deliverables

2) Research/class discussion - POST on blackboard->DiscussionBoard->Project 3 - Research Postings

Post a minimum of 3 articles related to your topic (MUST be good quality articles with Detailed algorithms and results
For each article you must provide:
- Author and Title of paper
- URL to online version of paper
- Synopsis: 2-3 paragraphs telling what this paper is about
- Use: 1-2 paragraphs about how you might use this research or parts of it in your Project 3.

3) Review Research - - Post your replies as replies to each of your 3 peoples postings on

blackboard->DiscussionBoard->Project 3 - Research Postings

Read 3 different people's research (so that is a total of 3 people each with 3 articles so that is 9 articles you will be reviewing). Respond to each of your 3 peoples research posting as a reply to their original research posting on discussion board.

4) Project Proposal -

Turn in Project 3 Proposal as a word document at Blackboard->Projects->Project 3 Proposal BE AS TECHNICAL AS YOU CAN

Section 1: GOAL STATEMENT Start of with describing the problem to me-- Like Reading Label on Flat Box Packages for low vision people

Section 1.1 INPUT show some typical input images (video) for the system. Tell me what conditions will it operate under / what are the constraints: good lighting, package flat and within +/- 10 degrees facing flat to camera, box not rotated more than +/-10 degrees, reading text with high contrast to background either dark text on lighter/white background or light text on dark background (show image exmaples) , etc.....

Section 1.2 OUTPUT tell me the output is : say in large font text of the label that user can scroll through it to read, OR output text to speech.

Section 2: Development Platform and Target AVD and Device you will test it on.

example, will target API 19, Android 4.4.2, using OpenCV version 2.4.10 in Eclipse and will test on Samsung Galaxy XXXX

Section 3: ALGORITHMS: discuss any alogrithms and the references you used to understand them and any source from OpenCV or other parties you might use

>>>>> Here is a partial example (I have not written enough but, you get the idea of the content. You will have Section 3.1 Overview and then you will have sections 3.2-3.* depending on how many components/ steps are in your proposed system)

Section 3.1: OVERVIEW: I will have the following system components: Label Area Finding, OCR in each Label Area, Output Results to User. The main addition I am coming up with as added value is the Label Area Finding and the OCR will be done using already existing code. Ofcourse the integration into an app is also important.

example Section 3.2: Label Area Finding: I am going to take a picture of the box and find potential Label Areas. I am going to do this using the following unique idea that I came up with:

Do Color Blob Detection using XYZ algorithm see https://github.com/Itseez/opencv/tree/master/samples/android/color-blob-detection for reference

Then I am going to select the top 5 colors present based on their area (histogram). I will have to decide how "close enough in color to each of the 5 colors a pixel can be to be counted in the area for that color". This will be done experimentally.

For each of the 5 top colors, I will create a Label Area (subimage of the original image) I am going to then create a sub-image that is ideally smaller than the entire original image such that it is the rectangle to encompass all of the pixels of that color.

I am going to pass the 5 top Label Areas for processing to Section 3.3

After looking at the results for typical input images, I may adjust and choose a smaller or larger number than 5 (this will be a parameter setting in my app called Detect_Number_Label_Areas)

example Secton 3.3: OCR in a Label Area: For each Label area found from previous component, I perform OCR. This is done using the TEsseract OCR Engine (maintained by Google), see http://gaut.am/making-an-ocr-android-app-using-tesseract/ and https://github.com/rmtheis/android-ocr and https://play.google.com/store/apps/details?id=edu.sfsu.cs.orange.ocr&hl=en

example Section 3.4: Reporting Results: I will present the user with both a blown up text version of the label in black text on white background and text to speech. The first part is using standard android GUI elements of a TextView contained in a ScrollView. The second part, text to speech, will be done using standard Android TextToSpeech class see http://developer.android.com/reference/android/speech/tts/TextToSpeech.html

Section 4: GUI interface

SHOW ALL interfaces and interactions to change interface by ---either draw by hand AND scan into your word document or use free mock interface tool try out http://balsamiq.com/

5) Project Due - Turn in to BB->DiscussionBoard->Project 3-Results

PDF containing:

URL to YouTube video: and it must go over a LIVE demonstration of the program working and describe what is working, what is not working, and how well it works (accuracy --e.g. 2/10 times, never, 9/10 times whatever)
Screen shots of the program working
short description of WHAT IS working
short description of WHAT IS NOT working AND WHY
Short description of HOW WELL you think your system worked.

zip of code

© Lynne Grewe