存入扫描文字,结果以图片格式(.bmp)存入电脑。然后使用orc识别系统进行转换,最终用word进行修改编辑(Scanning text, the results in picture format (.Bmp) into the computer. The orc..

文字识别系统  时间:2021-02-26  阅读:()

扫描文字,结果以图片格式(.bmp)存入电脑。然后使用orc识别系统进行转换,最终用word进行修改编辑Scanning text, the resultsin picture format (.Bmp) into the computer. The orcidentification system is then used to convert and eventuallyedit it with word

Scanning text, the results in picture format (.Bmp) into thecomputer. The ORC identification system is then used to convertand eventually edit it with WORD. Here' s how to use ORC:OCR is the abbreviation of English Optical Character

Recognition, translated into Chinese means to recognize textthrough optical technology. It is an important aspect ofautomatic recognition technology research and application. Itis a software that can automatically input text into a computer.It is the main software supporting the scanner. It belongs tothe category of non keyboard input, and needs the image inputequipment, which mainly matches with the scanner. Now OCRmainly refers to the text recognition software, before 1996,Thunis began Chinese collocation recognition software, scannerand OCR software on the market has been sold separately,professional OCR software is paying about henashi baked oystercap Ren Ping Hung toad sent widows K cavity Ping Yuan Rong thereal milk under direct Ling CR the software has been upgraded,the scanner vendors now have professional OCR softwarecollocation production sale scanner. The rapid development ofOCR technology is closely related to the extensive use ofscanners. In the past two years, with the gradualpopularization of scanners and the improvement of OCRtechnology, OCR has become the right assistant for most scanner

users.

I. the development of OCR Technology

Since early 1960s the first generation of OCR products, after30 years of continuous development and improvement, includinga variety of research on OCR technology of handwriting has maderemarkable achievements, people on the functional requirementsof OCR products from the original only pay attention to therecognition rate, to the development of the whole OCR systemidentification speed and user friendly interface, simpleoperation, product stability, adaptability, reliability andscalability, the pre-sale customer service service quality andother aspects put forward higher requirements.

IBM company first developed OCR products, and in 1965, at theNew York world fair, exhibited the OCR products of IBM company-- IBMl287. At that time, the product could only recognizeprinted numerals, English letters and some symbols, and mustbe of the specified font. At the end of 1960s, Hitachi companyand Fujitsu Corporation were also developed their own OCRproducts. The world' s first automatic postal sorting system forhandwritten zip code recognition was developed by the ToshibaCo, Japan, and two years later, NEC launched the same system.By 1974, the automatic sorting rate of letters reached about92%, and it was widely used in the postal system, and playeda better role. In 1983, the Japanese Toshiba Co issued its OCRsystem for identifying printed Chinese characters, OCRV595,with a recognition rate of 70~100 characters per second, witha recognition rate of 99.5%. Since then, Toshiba Co has startedthe research work of handwritten Chinese characters

recognition.

In the aspect of OCR technology research Chinese startsrelatively late, in 1970s began to study English numbers,letters and symbols recognition technology, at the endof 1970sbegan to study Chinese characters recognition. In 1986, theNational 863 program information field organized three unitsof Tsinghua University, Beijing Institute of InformationEngineering and Shenyang Automation Institute to jointlydevelop the Chinese OCR software. To 1989, Tsinghua Universitypioneered the first Chinese OCR software - Tsinghua TH-OCR1.0version, so far, the Chinese OCR officially moved from thelaboratory to the market. The OCR printed Chinese characterrecognition software of Qinghua University has also introducedTH-OCR 92 high performance practical simplified/traditional,multi font and multi-function printed Chinese characterrecognition system, which has made great progress in printedChinese character recognition technology. To the 1994 launchof the TH-OCR 94 high performance Chinese English mixed printedtext recognition system, has been identified as "experts athome and abroad is introduced for the first time in Chinese andEnglish mixed printed text recognition system, the overallinternational leading level". In late 90s of last century, theDepartment of electronic engineering of Tsinghua Universityproposed and conducted a comprehensive study of Chinesecharacters recognition, the Chinese characters recognitiontechnology in printed text, handwritten Chinese charactersrecognition, handwritten Chinese characters recognition andhandwritten symbol recognition and other fields has madeimportant achievements comprehensively. Representative of theresults is the TH-OCR 97 integrated Chinese characters

recognition system, it can complete multilingual (Chinese,English and Japanese) printed text, handwritten Chinesecharacters, handwritten Chinese characters and handwrittendigit recognition. Over the past few years, in addition toQinghua Tong TH-OCR, other OCR software, such as Shang Shu,SH-OCR and other styles have also come out, and the Chinese OCRmarket has steadily expanded, users all over the world.It can be said that the printing OCR recognition technology hasreached a high level. OCR products have evolved from earlyidentification of only printed numerals, letters and symbolsto automatic layout analysis and form recognition,

Mixed text, font, size, horizontal and vertical multi mixedrecognition powerful computer information fast entry tool. Therecognition rate of printed Chinese characters is more than 98%,even if the printing quality is poor, the recognition rate ofprinted Chinese characters reaches more than 95%. Can identifythe song, bold, italics, etc. a variety of fonts fangsongtisimplified, and can be identified for a variety of fonts,different font size mixed typesetting, recognition ofhandwritten Chinese characters on the rate of more than 70%.Especially after ten years of hard work Chinese characters ofOCR in our country started late, overcome the enormousdifficulties such as Chinese characters character sets, wordrecognition speed (completed in unit time from featureextraction to identify the output words can reach 70 words persecond) or above. As the printed OCR Chinese characterrecognition technology has been more mature, OCR products arewidely used in the press, printing, publishing, libraries,office automation and other industries.

Professional OCR products are geared to a specific industry,which is suitable for departments that need to process a largeamount of form information every day, such as postal service,taxation, customs, statistics and so on. This specific industryoriented professional OCR system, the format is fixed,identifying the character set is relatively small, often usedin combination with the input of special equipment, so it hasfast speed, high efficiency, such as mail sorting system etc. .Handwritten text recognition was not introduced until 1996 and1997, and was provided as an additional feature of printed textrecognition products. The habit of writing different freehandwriting recognition is very difficult, so the use of thefield of handwritten OCR technology is on-line handwritingrecognition, namely while writing, while computer recognition,is a real-time identification method.

Two, the basic principles of OCR

In brief, the basic principle of OCR is to input an image ofa document to a computer through a scanner, and then take outthe image of each text by the computer and convert it into thecoding of Chinese characters. The specific work process is thatthe scanner converts the light signal of the manuscript intoan electrical signal through a charge coupled device CCD, andconverts the analog to digital signal to the computer throughthe analogto digital converter. The computer accepts a digitalimage of the manuscript. The Chinese characters on the imagemay be printed Chinese characters or handwritten Chinesecharacters, and then the Chinese characters in the images can

be identified. For printed characters, first the image documentdata into original blackandwhite dot matrixby optical method,then converts the text in the image into text format throughthe recognition software, further processing to wordprocessing software. Among them, character recognition is animportant technology of OCR.

Two ways of 1.OCR recognition

As with other information data, graphic information in thecomputer scanner to capture all are 0, 1 of the two digitalrecording and recognition, all the information is only 0, 1holds a string of points or samples. OCR recognition programidentif ies character information on the page, mainly throughthe unit pattern matching method and feature extraction methodin two ways of character recognition.

Pattern (Matching) is a strict comparison of each characterwith a file with standard font and font size bitmap. If thereis a large database of saved characters in the application, theapplication selects the appropriate characters for propermatching. Software must use some processing techniques to findthe most similar matches, usually by experimenting withdifferent versions of the same character. Some software canscan a page of text and identify each character that definesa new font. Some software uses their own identificationtechnology to do their best to identify characters on the page,and then manually select or directly input the characters thatare not recognized.

Extraction (Feature) is the decomposition of each character

into many different character features, including diagonals,horizontal lines, and curves. These features are then matchedwith characters that are understood (recognized) . For a simpleexample, the application recognizes two horizontal lines, andit will "think" the character may be "two"". The advantage offeature extraction is that it can recognize a variety of fonts,for example, Chinese calligraphy is the use of featureextraction method to achieve character recognition.

Most OCR applications add the syntax intelligence checkfunction, which further improves the recognition rate. It ismainly through the method of context check spelling and grammarcorrection in character recognition, the OCR application willdo multiple context check, according to the existing procedurein fixed phrases, word order, check the corresponding word forword string. More advanced applications automatically replacethe wrong words with what they think is the right word andcorrect the meaning of the statement.

Two

Several steps of text recognition

Text recognition includes the following steps: text input,preprocessing, word recognition and post-processing.

(1) graphic input

Refers to the input device through the document input to thecomputer, that is, the realization of the original digital. Themore widely used device now is the scanner. The scanning quality

of document image is the precondition of OCR software' s correctrecognition. The proper choice of scanning resolution andrelated parameters is the key to ensure clear text and no lossof features. In addition, the document is positioned as far aspossible so as to ensure that the skew angle of thepreprocessing detection is small, and the distortion of thetext image is small after the tilt correction is performed.These simple operations will improve the system' s recognitionaccuracy. On the contrary, due to improper scanning settings,the text is too many broken pen, may be detected half of thetext of the image. When the characters are broken and thestrokes are stuck, some features are lost, and the featuredistance is increased, and the recognition error rate isincreasedwhen comparing the features with the feature library.

(2) pretreatment

Scanning an image of a simple printed document, each text imageis checked out to identify the recognition module, which iscalled image preprocessing. The pretreatment is in somepreparations before the character recognition, including imagepurification treatment, remove the obvious noise in theoriginal image (interference) . The main task is to tiltmeasurement document placement angle of document layoutanalysis, layout confirmation of the selected text, text on thehorizontal and vertical layout segmentation, separation oftext images for each row, punctuation discrimination. The workof this stage is very important, and the effect of processingdirectly affects the accuracy of text recognition.

Layout analysis is a general analysis of text images, which

extracts all text blocks fromthe document, distinguishes text,paragraphs, and typesetting sequences, as well as regions ofimages and tables. Each block of text domain (domain circlesin the image of the starting point and end point coordinates) ,attribute domain (horizontal and vertical layout) and theconnecting relation of each block of text as a data structurefor recognition module automatic recognition. The text area isdirectly recognized and processed, and the table area isanalyzed and recognized by special tables, and the image areais compressed or simply stored. Word segmentation is theprocess of separating large images into rows and thenseparating individual characters from an image line.

(3) word recognition

Word recognition is the core technique of OCR characterrecognition. Text image detection from scanning the text, bythe computer graphics and images into standard code words, isthe key to make the computer "read", also known as recognitiontechnology. Just like the human brain knows the charactersbecause the characters in the human brain have been preserved,such as the structure of the text, the strokes of the writing,etc. . Want to let the computer to identify words, also need tofirst text feature information stored in the computer, but whatkind of information should be stored and how to obtain thisinformation is a very complex process, but also to achieve avery high recognition rate to meet the requirements. The usualmethod is to analyze the strokes, feature points, projectioninformation, and the region distribution of the text.There are thousands of characters used in Chinese characters.

sharktech:洛杉矶/丹佛/荷兰高防服务器;1G独享$70/10G共享$240/10G独享$800

sharktech怎么样?sharktech (鲨鱼机房)是一家成立于 2003 年的知名美国老牌主机商,又称鲨鱼机房或者SK 机房,一直主打高防系列产品,提供独立服务器租用业务和 VPS 主机,自营机房在美国洛杉矶、丹佛、芝加哥和荷兰阿姆斯特丹,所有产品均提供 DDoS 防护。不知道大家是否注意到sharktech的所有服务器的带宽价格全部跳楼跳水,降幅简直不忍直视了,还没有见过这么便宜的独立服...

HostSailor:罗马尼亚机房,内容宽松;罗马尼亚VPS七折优惠,罗马尼亚服务器95折

hostsailor怎么样?hostsailor成立多年,是一家罗马尼亚主机商家,机房就设在罗马尼亚,具说商家对内容管理的还是比较宽松的,商家提供虚拟主机、VPS及独立服务器,今天收到商家推送的八月优惠,针对所有的产品都有相应的优惠,商家的VPS产品分为KVM和OpenVZ两种架构,OVZ的比较便宜,有这方面需要的朋友可以看看。点击进入:hostsailor商家官方网站HostSailor优惠活动...

火数云-618限时活动,国内云服务器大连3折,限量50台,九江7折 限量30台!

官方网站:点击访问火数云活动官网活动方案:CPU内存硬盘带宽流量架构IP机房价格购买地址4核4G50G 高效云盘20Mbps独享不限openstack1个九江287元/月立即抢购4核8G50G 高效云盘20Mbps独享不限openstack1个九江329元/月立即抢购2核2G50G 高效云盘5Mbps独享不限openstack1个大连15.9元/月立即抢购2核4G50G 高效云盘5Mbps独享不限...

文字识别系统为你推荐
淘宝客推广淘宝客推广有用吗?网店推广网站网店怎么推广?qq怎么发邮件手机QQ怎么发邮件开机滚动条开机滚动条要很长时间怎么解决?人人逛街包公免费逛街打一成语宕机宕机是什么意思虚拟专用网安卓手机的虚拟专用网设置是什么东西?怎么用?Qzongqzong皮肤上怎样写字声母是什么什么是声母域名库电脑上文件有多少域名?各什么意思?
双线服务器租用 免费cdn加速 gateone xen 北京主机 浙江独立 个人免费空间 gg广告 91vps 100m独享 美国网站服务器 多线空间 沈阳主机托管 ebay注册 全能空间 免费主页空间 alexa世界排名 phpwind论坛 alexa搜 comodo 更多