DocEng 2011: Citation Pattern Matching Algorithms for Citation-based Plagiarism Detection

Subscribers:
348,000
Published on ● Video Link: https://www.youtube.com/watch?v=cNzMFvY_FXw



Duration: 28:12
2,481 views
16


The 11th ACM Symposium on Document Engineering
Mountain View, California, USA
September 19-22, 2011

Citation Pattern Matching Algorithms for Citation-based Plagiarism Detection: Greedy Citation Tiling, Citation Chunking and Longest Common Citation Sequence
Bela Gipp, Norman Meuschke
Presented by Bela Gipp.

ABSTRACT

Plagiarism Detection Systems have been developed to locate instances of plagiarism e.g. within scientific papers. Studies have shown that the existing approaches deliver reasonable results in identifying copy&paste plagiarism, but fail to detect more sophisticated forms such as paraphrased, translated or idea plagiarism. The authors of this paper demonstrated in recent studies that the detection rate can be significantly improved by not only relying on text analysis, but by additionally analyzing the citations of a document. Citations are valuable language independent markers that are similar to a fingerprint. In fact, our examinations of real world cases have shown that the order of citations in a document often remains similar even if the text has been strongly paraphrased or translated in order to disguise plagiarism. This paper introduces three algorithms and discusses their suitability for the purpose of Citation-based Plagiarism Detection. Due to the numerous ways in which plagiarism can occur, these algorithms need to be versatile. They must be capable of detecting transpositions, scaling and combinations in a local and global form. The algorithms are coined Greedy Citation Tiling, Citation Chunking and Longest Common Citation Sequence. The evaluation showed that common forms of plagiarism can be detected reliably if these algorithms are combined.




Other Videos By Google TechTalks


2011-10-052011 Frontiers of Engineering: Additive Manufacturing in Aerospace
2011-10-042011 Frontiers of Engineering: Challenges and Opportunities for Low-Carbon Buildings
2011-10-042011 Frontiers of Engineering: Multi-Scale Modeling of Sustainable Buildings
2011-10-042011 Frontiers of Engineering: Accelerating Green Building Market Transformation with IT
2011-10-042011 Frontiers of Engineering: Where Are the Emerging Frontiers in Research and Innovation?
2011-10-04DocEng 2011: Reflowable Documents Composed from Pre-rendered Atomic Components
2011-10-04DocEng 2011: Paginate Dynamic and Web Content
2011-10-04DocEng 2011: Document Visual Similarity Measure For Document Search
2011-10-04DocEng 2011: A Versatile Model for Web Page Representation
2011-10-03DocEng 2011: Expressing Conditions in Tailored Brochures for Public Administration
2011-10-03DocEng 2011: Citation Pattern Matching Algorithms for Citation-based Plagiarism Detection
2011-10-03DocEng 2011: A Study of the Interaction of Paper Substrates on Printed Forensic Imaging
2011-09-302011 Frontiers of Engineering: Automatic Text Understanding of Content and Text Quality
2011-09-29DocEng 2011: Interoperable Metadata Semantics
2011-09-292011 Frontiers of Engineering: Large Scale Visual Semantic Extraction
2011-09-292011 Frontiers of Engineering: Advancing Natural Language Understanding
2011-09-282011 Frontiers of Engineering: Research at Google Lightning Talks
2011-09-28DocEng 2011: An Efficient Language-Independent Method to Extract Content from News Webpages
2011-09-28DocEng 2011: Dynamic Assistance to Adding Dimensions to Multi-structured Documents
2011-09-28DocEng 2011: Component-based Hypervideo Model
2011-09-282011 U.S. Frontiers of Engineering: Welcome and Opening Remarks



Tags:
google tech talk
doceng2011
document engineering