Sunday, January 26, 2020

Study of Document Layout Analysis Algorithms

Study of Document Layout Analysis Algorithms Relative Study of Document Layout Analysis Algorithms for Printed Document Images Divya Kamat, Divya Sharma, Parag Chitale, Prateek Dasgupta ABSTRACT In the following survey paper, the different algorithms that could be used for document layout analysis have been studied and their results have been compared. For the removal of image mask, Bloomberg’s algorithm and CRLA have been described. For the purpose of text segmentation, we have studied the Recursive XY Cut algorithm, RLSA and RLSO algorithms. Introduction Physical layout analysis of printed document images is the first step of the OCR conversion. For the OCR to work effectively, we need to provide an input wherein no images are present in the document i.e. the image contains only text. If this is not done properly, the OCR will return garbage values. To avoid this, we have discussed two algorithms, Bloomberg’s Algorithm and CRLA that could be used for the removal images from the document images. The next step is the text segmentation wherein we find the text blocks inside the document. The coordinates of these text blocks are then passed as input to the OCR. To perform this segmentation, we have discussed the recursive XY cut algorithm, the RLSA and RLSO algorithms. Removal of Image from Document The first step in the document layout analysis is to remove the images present in the original document. We will be discussing the Bloomberg’s algorithm along with its variations and the CRLA algorithm for image removal. Bloomberg’s Algorithm The Bloomberg’s algorithm is primarily used to find the image mask of halftone images. The implementation of this algorithm uses basic morphological operations. The algorithm has the following steps: In the first step, the binarization of the input image is performed. Next, 41 threshold reduction is performed twice using threshold T=1. 41 threshold reduction is performed using T=4. 41 threshold reduction is performed using T=3. Opening the image with a structural element of size 55. Next, 14 expansion of the image is performed twice. Next the union of overlapping components of the seed image obtained from step 6 with the image obtained from step 2 is performed. Dilation with structural element 33 followed by 14 expansion which is performed twice. The halftone mask obtained from step 8 is then subtracted from the binarized input image. The main issue with Bloomberg’s algorithm is that it is unable to distinguish between text and sketches (i.e. line drawings) in a printed document image. Enhanced CRLA Algorithm CRLA stands for Constraint Run Length Algorithm. In this algorithm we apply horizontal and vertical smoothening to the document image to get a clear separation between text and images in the document. Enhanced CRLA is used to smooth out only the text part in the image and avoid smoothening of non-textual part of the document image. Algorithm: Label the connected components in the document image. Classify the components with respect to their heights as follows: Height less than or equal to 1 cm, label it as 1 Height between 1 and 3 cm, label it as 3 Height greater than 3 cm, label it as 3 Apply horizontal smoothening to the components with label 1 only. Apply vertical smoothening to the components with label 1 only. Logically AND the two images obtained previously. Apply horizontal smoothening to the output image of AND operation. Calculate Mean Black Run Length Calculate the Black Run Length (BRL) row-wise for the region under consideration. Maintain a Black-White Transition Count (TC) for the region. Calculate Mean BRL as MBRL= (BRL/TC). Calculate Mean Transition Count Maintain a Black-White Transition Count (TC) for the region. Calculate W, the width of the region. Calculate Mean TC as MTC=(TC/W) Extract the components from the image with label 1 having values of MBRL and MTC in the acceptable range for the typical document image. Apply horizontal smoothening to the components with label 2 only. Apply vertical smoothening to the components with label 2 only. Logically AND the two images obtained previously. Apply horizontal smoothening to the output image of AND operation. Calculate MBRL and MTC. Extract the components from the image with label 2 and 3 having values MBRL and MTC in the acceptable range for the typical document image. At step 9 we extract the text part of the document image and at step 15 we extract the non-text part of the document image. The main advantage of the CRLA algorithm is that clear separation of text and non-text part of the document image. It also works for sketches as well as halftones effectively. It has considerably less complexity as selective smoothening is done. However, after the removal of the non-textual part of the document image, some stray pixels remain the image. The connected components in the halftone image whose height is less than 1cm are assumed as text elements in the algorithm. This results in presence of unwanted components in the final image. Text Segmentation The next step in the document layout analysis is the segmentation of text into text blocks that could be provided as input to the OCR. The following algorithms have been studied for this: Recursive XY Cut algorithm The recursive XY cut algorithm is used for obtaining text blocks from an image that does not contain any images from the original printed document. The XY cut algorithm works in the following way: The bounding boxes of the image are calculated. Next we calculate the horizontal and vertical projections of the image. After calculating the projections, we then perform X cuts on all the valleys in the horizontal projections which have a value greater than the threshold th. Next we perform Y cuts in between these X cuts at all the valleys in the vertical projections which have a value greater than the threshold tv. We repeat the steps 3 and 4 until there are no further X or Y cuts possible in a region. One of the problems with XY cut algorithm is that there is no method to find a threshold that will work for all the documents. Instead, a new threshold needs to be determined for each document and this cannot be done without manual intervention. Another major issue with the recursive XY algorithm is the time complexity. The recursive XY cut algorithm requires a large time to complete execution. Despite these disadvantages, this algorithm successfully separates the text blocks provided that a manual threshold is provided. RLSA The Run-Length Smoothing Algorithm (RLSA) works on black white scanned images of documents. It finds runs of white pixels and converts them into black pixels whenever they are less than a given threshold. The RLSA works in four steps: In the first step, we perform horizontal smoothing. For this, we scan the image row-wise and then replace lengths of white pixels by black pixels if they are less than a threshold th. In the second step, we perform vertical smoothing. For this, we scan the image column-wise and then replace lengths of white pixels by black pixels if they are less than a threshold tv. Next, we perform logical ANDing of the images obtained from the first and second steps. Then we perform horizontal smoothing on the image obtained from step 3 with a threshold ta. RLSO A simplified version of the RLSA, RLSO (Run-Length Smoothing with OR) works as follows: In the first step, we perform horizontal smoothing. For this, we scan the image row-wise and then replace lengths of white pixels by black pixels if they are less than a threshold th. In the second step, we perform vertical smoothing. For this, we scan the image column-wise and then replace lengths of white pixels by black pixels if they are less than a threshold tv. Next we perform a logical OR operation on the images obtained from the first and second step. The RLSA algorithm returns rectangular frames of documents with Manhattan Layouts. On the other hand, RLSO algorithm also works well with non-Manhattan layouts. The problem with both RLSA and RLSO is that the threshold for smoothing needs to be determined manually. Also the threshold required for each document image is different and it is almost impossible to be determined manually. Conclusion We have compared the above given algorithms for the document layout analysis. During our research we found that, while Bloomberg’s algorithm faces problems for images that contain sketches, CRLA faces problems for images that contain extremely small non-textual elements. We also observed that the recursive XY Cut algorithm and RLSA both do not work on printed documents having non-Manhattan layouts. On the other hand, the RLSO algorithm gives comparatively better results for Manhattan as well as non-Manhattan layouts. However, all three algorithms mentioned above face the common problem of manual threshold determination which is document specific. References Syed Saqib Bukhari, Faisal Shafait and Thomas M. Bruel, â€Å"Improved Document Image Segmentation Algorithm using Multiresolution Morphology† Jaekyu Ha and Robert M. Haralick, Ihsin T. Philips, â€Å"Recursive XY Cut using Bounding Boxes of Connected Components† , Third International Conference on Document Analysis and Recognition, ICDAR, 1995 Stefano Ferilli, Teresa M.A. Basile, Floriana Esposito, â€Å"A histogram-based Technique for Automatic Threshold Assessment in a Run Length Smoothing-based Algorithm†, ACM, 2010. Hung-Ming Sun, â€Å"Enhanced Constrained Run-Length Algorithm for Complex Layout Document Processing†, International Journal of Applied Science and Engineering, 2006

Saturday, January 18, 2020

Pag-IBIG Fund Essay

Pag-IBIG is an acronym which stands for Pagtutulungan sa Kinabukasan: Ikaw, Bangko, Industria at Gobyerno. In effect, Pag-IBIG harnesses these four sectors of our society to provide its members with adequate housing through as effective savings scheme. Coverage These guidelines shall cover the development and construction of low cost housing units in Metro Manila and highly urbanized cities, and socialized housing units in the provinces by Pag-IBIG Fund. Objectives To provide low-cost and socialized house and lot packages/condominium units either for rent or for sale to low income Pag-IBIG members who cannot afford the housing packages available in the market. To enable Pag-IBIG Fund to perform its mandate by using its funds to provide decent and affordable condominium units as well as house and lot packages for sale to eligible Pag-IBIG Fund members nationwide. To stimulate competition that will bring about better housing packages in terms of price and development that will redound to the benefit not only of Pag-IBIG Fund members but also of the public in general. To help solve the housing backlog by generating further demand for housing through the provision of affordable condominium units and house and lot packages. To equitably distribute nationwide economic opportunities generated from housing production, and in the process, stimulate stability brought about by economic development. To provide an opportunity for Local Government Units (LGUs) to comply with R.A. 7279 by identifying and providing land for socialized housing. To simplify and facilitate the processing of end-user financing for eligible Pag-IBIG Fund members, given that the projects are owned by Pag-IBIG Fund. To develop further sense of ownership, pride and confidence among members of the Fund, knowing fully well that the projects being constructed are direct investments made from their savings with the institution. To generate more membership to Pag-IBIG Fund. To develop and dispose acquired properties of the Fund.

Friday, January 10, 2020

Problems in School Essay

Education is the most important factor for the development of human civilization. It is one of the ways that can help us to achieve our goals in the future. However, there have been many problems raised throughout the year in regarding to what our school system should be practicing to improve education. These problems consists of self-discipline, longstanding bullying and the case about school uniforms should be lessen in order to have a positive dispositions on education. Discipline and balancing is important in a student’s life. Sometimes, the person who has the freedom to do anything she wants will have the tendency to lose self-discipline and balance between extracurricular activities and academics. There was a time in my life when I thought I would not be able to finish high school because I got distracted by the social life around me. We could only be young once, as the clichà © goes. Indeed, I truly lived my teenage life to the fullest, to the point of over-living it. In fact, I was still in my early teens when I started going steady with my social life than school. My mother had a hard time straightening my head. However, the consequences of my actions had sadly taught me a lesson. I failed some of my classes when I was in freshman. Also, I joined volleyball and cheerleading in my junior year which completely gave me a hard time catching up with our lessons. I was forced to attend the after-school tutoring. Stumbling upon of having no discipline and balance between school, sports and social life are regrettably sad. Yes, I have a lot of learning, and acknowledge that school must be prioritized. Second, bullying has a real negative effect on the victim’s life. Those who are constantly bullied can be pushed to the breaking point, where they could end up hurting themselves or others. I have a school mate back when I was senior, and he had trouble coping with things since his parents got divorced. Over the last few months, he has neglected his school works, and gotten a few face piercing that were prohibited in our school. A few of his classmates have become hostile towards him because of the sudden physical changes, and his lack of social etiquette. The controversy have suddenly escalated as they have pushed him onto the ground, kicked him in the stomach, and locked him in the bathroom. Also, a few of the school jocks were standing in the hallway joking around when they spot their smaller classmate struggling to carry his school books. One of the jocks sticks his foot out and deliberately tripped the boy. They all laughed and called the boy names such as, â€Å"clumsy† and â€Å"dork†. Bullies pick on students who they think are physically weak, and unpopular with their peers. So, whether bullying comes in many different forms: both from cyber bullying to physical bullying, it is consider wrong and it has to stop. Lastly, clothing has become a way of self-expression on how a person dresses, and it usually reflects on their personality. The most common issue is some students being harassed by other students for the way they dress, and how they appear. When I was sophomore, gang violence has become a big concern throughout my high school. Gangs choose colors to wear which lets people know to which gang they belong to. Students, who do not know of this issue, wear that certain color during dress down day, and some of them got hurt because they were not aware of the specific gang colors. Then, I noticed all students nowadays must go out and have the new fashion trends, and style. However, not all parents are able to go out of their way to buy their children clothes. So, uniforms save not only the students for getting harass base on what they wear, but also for the parents who are already busy working to give and support our necessities. Uniforms cause children to be more civilized and mature in what they are doing. It is great for schools to implement a policy on school uniforms because it provides more focus to learning, reduces peer pressure, and increases school pride. Attitude can alter every aspect of a person’s life, including his or her education. Student’s attitudes on learning determine their ability and willingness to learn. Furthermore, it is never too late to improve our educational system. Every school should be more advance and provide a good learning environment first because a highly effective school profoundly enhances student’s prosperity.

Thursday, January 2, 2020

Christianity Developing Faith - 991 Words

Christianity Historical Facts Where How Does This Faith Perspective Originate? Christianity began within the Jewish faith. In 63 B.C.E., the Roman Empire made its way to Palestine. The Romans stated they would stay out of the Jewish affairs, as long as they paid their taxes and aspired peace. However, many individuals, the Zealots, desired to overthrow the Romans. The common people believed a new king would save them from the Roman tyranny. In 6 CE, Judas, the Galilean led a revolt, but the Romans ended quickly, along with 2,000 Zealots lives. Then in 66 CE an open revolt developed in Jerusalem, which the Jews won, but in 70 CE the Romans returned defeating the Jews, and began to make their lives intolerable. It is in this political, social, and spiritual climate that Christianity is born. This history is the special plan of God that is the apex in the birth, life, death, and resurrection of Jesus. Founder Followers: The founder of Christianity is Jesus, who was born outside Bethlehem in a man ger in about 6 B.C.E. Jesuss early life is unknown, besides a few stories. It is the last three years of his life that provides the focus for most Christians. When Jesus is baptized by John the Baptist in the Judean countryside, and it marked the beginning of a ministry that centers on preaching, teaching, and healing. Jesus seemed the sick, the poor, the dying, and the sinners. Jesus is also considered Messiah (the king) and had 12 disciples who were hand selectedShow MoreRelatedEssay Before and After Christianity741 Words   |  3 PagesBefore and after the rise of Christianity, philosophers depended largely on developing axioms and using them to draw conclusions about the world. Before Christianity, the axioms were typically based on what was apparent to human reason. After Christianity became widespread, thinkers had to contend with a new source of knowledge- one based on faith rather than on what appeared self-evident to the human mind. Early Christians justified their dependence on faith in different ways. Some embraced fideismRead MoreAmerican Evangelicalism Essay917 Words   |  4 Pagespaper will cover the topic of American Evangelicalism addressing the Twentieth Century ideologies. While viewing how Christianity interacted with the terms of the activities of evangelicals and the ecumenical movement. This paper will also cover how Christianity interacted with the terms of the activities of the Vatican II, religion, and politics and the global aspect of Christianity. It was in the year of 1942 in St. Louis Missouri when a group of nationally known leaders came together and formedRead MoreThe Religions Of Islam, Hinduism, And Buddhism1241 Words   |  5 Pages Religion and politics are quite interwoven into society both in the developed world as well as in the developing world. This paper will discuss the major religions of Islam, Christianity, Hinduism, and Buddhism and how these faiths connect to politics of nations. Islam has always been a very mixed within state and society as a whole, with no true line between church and state. Islam began in the Arabia by the Prophet Muhammad in the seventh century, and is the second largest religion in the worldRead MorePsychology And Christianity Second Edition By David Entwistle1248 Words   |  5 PagesSUMMARY The book Integrative approaches to psychology and Christianity second edition by David Entwistle introduces the text by explaining how psychology can go in a direction, and Christian theology approach can lead in a different direction. He continued by stating the need for understanding and studying human behavior because people come from different walks of life and different expectations. It leads us to form unique perspectives to help give us an understanding of the individuals you encounterRead MoreArt Commission Statement1333 Words   |  6 Pageslobby of the new Christian and Jewish Interfaith Cultural and Historical Center. The commission statement will explain how the Ten Commandments statue represents developments in past and current world events and cultural patterns in Judaism and Christianity. We will discuss the importance of the proposed location and why it is appropriate being placed at that location. This commission statement will discuss our belief of how the Ten Comm andments statue reflects the Jewish and Christian concept ofRead MoreConcept Of Identity And Identity1529 Words   |  7 Pagesconcept of Identity directly relates to religion and religious behaviour due to religion being a main source in developing the aforementioned concept. Religion provides the background knowledge necessary to self-identify. Religion features the idealistic human being, the basis of acceptable behaviour and the actions to perform self-reflection. This gives proportion during the process of developing an Identity. Using Catholicism as an example, the idealistic human being is represented by Jesus, a selflessRead MoreGod, Religion, and Me Essay853 Words   |  4 Pageswe can see that Christianity is based on the mystery of the doctrine of the trinity, which cannot be proven and requires Christians to have faith that this the root of Christianity is in fact true. In following a structured religion it seems that many people need to believe in a power greater than themselves, in order to make life bearable. The promise of life after death (salvation) is very important to these people, without it they fined life meaningless Ritual Christianity is a ritual filledRead MoreEssay Understanding Islamic Religion and Culture1221 Words   |  5 PagesWhat is your view on the Islamic religion and culture? Did you know they are the fastest growing religion in the world today, with eighty-percent now developing outside the Arab world(Belt, Don) Today’s generation does not show much interest or knowledge of their own religion; let alone the Islamic people. Most people only expose themselves to information that matches their own opinions and beliefs. Instead of learning and exploring new things about the world they stick to what they know becauseRead MoreChristianity And The Rise Of Christianity Essay1468 Words   |  6 Pages Christianity in the 21st century is the largest religion in the world and has over two billion followers. Christians classify themselves under 34,000 different denominations. This popular religion is all about the life, death and resurrection of Jesus Christ. When a religion is born so is a holy book to remember it by. The bible is considered the word of God and is known for its infamous stories and sayings. The Bible contains rules and righteous acts to live by. Many question as to how ChristianityRead MoreThe Resurrection Of Jesus Christ Essay1709 Words   |  7 PagesThe religion Christianity is grounded through the wisdom and miracles of Jesus Christ. Many may call him Jesus or Christ but in actuality Jesus is Christ which means anointed one. Jesus Christ was no average person, he is the God the Father who came to this world and fulfilled the Old Testament laws and prophecies, died on the cross, and rose from the dead physically. As savior that came on earth to restore his peopl e he performed many miracles which were recorded in the Gospels by the eyewitnesses