Exploring XML Web collections with DescribeX
As Web applications mature and evolve, the nature of the semistructured data that drives these applications also changes. An important trend is the need for increased flexibility in the structure of Web documents. Hence, applications cannot rely solely on schemas to provide the complex knowledge nee...
Guardado en:
Autor principal: | |
---|---|
Publicado: |
2010
|
Materias: | |
Acceso en línea: | https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_15591131_v4_n3_p_Consens http://hdl.handle.net/20.500.12110/paper_15591131_v4_n3_p_Consens |
Aporte de: |
id |
paper:paper_15591131_v4_n3_p_Consens |
---|---|
record_format |
dspace |
spelling |
paper:paper_15591131_v4_n3_p_Consens2023-06-08T16:23:22Z Exploring XML Web collections with DescribeX Vaisman, Alejandro Ariel Semistructured data Structural summaries XML XPath Cardinalities Complex mapping Increased flexibility Industry standards Open content Podcasting Regular expressions Schemas Semi structured data Structural summaries Structural summary WEB application Web collections Web document Web document collection XPath queries Markup languages Rough set theory World Wide Web XML As Web applications mature and evolve, the nature of the semistructured data that drives these applications also changes. An important trend is the need for increased flexibility in the structure of Web documents. Hence, applications cannot rely solely on schemas to provide the complex knowledge neededtovisualize, use, query and manage documents. Even when XMLWeb documents are valid with regard to a schema, the actual structure of such documents may exhibit significant variations across collections for several reasons: the schema may be very lax (e.g., RSS feeds), the schema may be large and different subsets of it may be used in different documents (e.g., industry standards like UBL), or open content models may allow arbitrary schemas to be mixed (e.g., RSS extensions like those used for podcasting). For these reasons, many applications that incorporate XPath queries to process a large Web document collection require an understanding of the actual structure present in the collection, and not just the schema. To support modern Web applications, we introduce DescribeX, a powerful framework that is capable of describing complex XML summaries of Web collections. DescribeX supports the construction of heterogenous summaries that can be declaratively defined and refined by means of axis path regular expression (AxPREs). AxPREs provide the flexibility necessary for declaratively defining complex mappings between instance nodes (in the documents) and summary nodes. These mappings are capable of expressing order and cardinality, among other properties, which can significantly help in the understanding of the structure of large collections of XML documents and enhance the performance of Web applications over these collections. DescribeX captures most summary proposals in the literature by providing (for the first time) a common declarative definition for them. Experimental results demonstrate the scalability of DescribeX summary operations (summary creation, as well as refinement and stabilization, two key enablers for tailoring summaries) on multi-gigabyte Web collections. © 2010 ACM. Fil:Vaisman, A.A. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales; Argentina. 2010 https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_15591131_v4_n3_p_Consens http://hdl.handle.net/20.500.12110/paper_15591131_v4_n3_p_Consens |
institution |
Universidad de Buenos Aires |
institution_str |
I-28 |
repository_str |
R-134 |
collection |
Biblioteca Digital - Facultad de Ciencias Exactas y Naturales (UBA) |
topic |
Semistructured data Structural summaries XML XPath Cardinalities Complex mapping Increased flexibility Industry standards Open content Podcasting Regular expressions Schemas Semi structured data Structural summaries Structural summary WEB application Web collections Web document Web document collection XPath queries Markup languages Rough set theory World Wide Web XML |
spellingShingle |
Semistructured data Structural summaries XML XPath Cardinalities Complex mapping Increased flexibility Industry standards Open content Podcasting Regular expressions Schemas Semi structured data Structural summaries Structural summary WEB application Web collections Web document Web document collection XPath queries Markup languages Rough set theory World Wide Web XML Vaisman, Alejandro Ariel Exploring XML Web collections with DescribeX |
topic_facet |
Semistructured data Structural summaries XML XPath Cardinalities Complex mapping Increased flexibility Industry standards Open content Podcasting Regular expressions Schemas Semi structured data Structural summaries Structural summary WEB application Web collections Web document Web document collection XPath queries Markup languages Rough set theory World Wide Web XML |
description |
As Web applications mature and evolve, the nature of the semistructured data that drives these applications also changes. An important trend is the need for increased flexibility in the structure of Web documents. Hence, applications cannot rely solely on schemas to provide the complex knowledge neededtovisualize, use, query and manage documents. Even when XMLWeb documents are valid with regard to a schema, the actual structure of such documents may exhibit significant variations across collections for several reasons: the schema may be very lax (e.g., RSS feeds), the schema may be large and different subsets of it may be used in different documents (e.g., industry standards like UBL), or open content models may allow arbitrary schemas to be mixed (e.g., RSS extensions like those used for podcasting). For these reasons, many applications that incorporate XPath queries to process a large Web document collection require an understanding of the actual structure present in the collection, and not just the schema. To support modern Web applications, we introduce DescribeX, a powerful framework that is capable of describing complex XML summaries of Web collections. DescribeX supports the construction of heterogenous summaries that can be declaratively defined and refined by means of axis path regular expression (AxPREs). AxPREs provide the flexibility necessary for declaratively defining complex mappings between instance nodes (in the documents) and summary nodes. These mappings are capable of expressing order and cardinality, among other properties, which can significantly help in the understanding of the structure of large collections of XML documents and enhance the performance of Web applications over these collections. DescribeX captures most summary proposals in the literature by providing (for the first time) a common declarative definition for them. Experimental results demonstrate the scalability of DescribeX summary operations (summary creation, as well as refinement and stabilization, two key enablers for tailoring summaries) on multi-gigabyte Web collections. © 2010 ACM. |
author |
Vaisman, Alejandro Ariel |
author_facet |
Vaisman, Alejandro Ariel |
author_sort |
Vaisman, Alejandro Ariel |
title |
Exploring XML Web collections with DescribeX |
title_short |
Exploring XML Web collections with DescribeX |
title_full |
Exploring XML Web collections with DescribeX |
title_fullStr |
Exploring XML Web collections with DescribeX |
title_full_unstemmed |
Exploring XML Web collections with DescribeX |
title_sort |
exploring xml web collections with describex |
publishDate |
2010 |
url |
https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_15591131_v4_n3_p_Consens http://hdl.handle.net/20.500.12110/paper_15591131_v4_n3_p_Consens |
work_keys_str_mv |
AT vaismanalejandroariel exploringxmlwebcollectionswithdescribex |
_version_ |
1768544566779052032 |