Exploring XML Web collections with DescribeX

As Web applications mature and evolve, the nature of the semistructured data that drives these applications also changes. An important trend is the need for increased flexibility in the structure of Web documents. Hence, applications cannot rely solely on schemas to provide the complex knowledge nee...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autor principal:	Vaisman, Alejandro Ariel
Publicado:	2010
Materias:	Semistructured data Structural summaries XML XPath Cardinalities Complex mapping Increased flexibility Industry standards Open content Podcasting Regular expressions Schemas Semi structured data Structural summary WEB application Web collections Web document Web document collection XPath queries Markup languages Rough set theory World Wide Web
Acceso en línea:	https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_15591131_v4_n3_p_Consens http://hdl.handle.net/20.500.12110/paper_15591131_v4_n3_p_Consens
Aporte de:	Biblioteca Digital - Facultad de Ciencias Exactas y Naturales (UBA) de Universidad de Buenos Aires

id	paper:paper_15591131_v4_n3_p_Consens
record_format	dspace
spelling	paper:paper_15591131_v4_n3_p_Consens2023-06-08T16:23:22Z Exploring XML Web collections with DescribeX Vaisman, Alejandro Ariel Semistructured data Structural summaries XML XPath Cardinalities Complex mapping Increased flexibility Industry standards Open content Podcasting Regular expressions Schemas Semi structured data Structural summaries Structural summary WEB application Web collections Web document Web document collection XPath queries Markup languages Rough set theory World Wide Web XML As Web applications mature and evolve, the nature of the semistructured data that drives these applications also changes. An important trend is the need for increased flexibility in the structure of Web documents. Hence, applications cannot rely solely on schemas to provide the complex knowledge neededtovisualize, use, query and manage documents. Even when XMLWeb documents are valid with regard to a schema, the actual structure of such documents may exhibit significant variations across collections for several reasons: the schema may be very lax (e.g., RSS feeds), the schema may be large and different subsets of it may be used in different documents (e.g., industry standards like UBL), or open content models may allow arbitrary schemas to be mixed (e.g., RSS extensions like those used for podcasting). For these reasons, many applications that incorporate XPath queries to process a large Web document collection require an understanding of the actual structure present in the collection, and not just the schema. To support modern Web applications, we introduce DescribeX, a powerful framework that is capable of describing complex XML summaries of Web collections. DescribeX supports the construction of heterogenous summaries that can be declaratively defined and refined by means of axis path regular expression (AxPREs). AxPREs provide the flexibility necessary for declaratively defining complex mappings between instance nodes (in the documents) and summary nodes. These mappings are capable of expressing order and cardinality, among other properties, which can significantly help in the understanding of the structure of large collections of XML documents and enhance the performance of Web applications over these collections. DescribeX captures most summary proposals in the literature by providing (for the first time) a common declarative definition for them. Experimental results demonstrate the scalability of DescribeX summary operations (summary creation, as well as refinement and stabilization, two key enablers for tailoring summaries) on multi-gigabyte Web collections. © 2010 ACM. Fil:Vaisman, A.A. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales; Argentina. 2010 https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_15591131_v4_n3_p_Consens http://hdl.handle.net/20.500.12110/paper_15591131_v4_n3_p_Consens
institution	Universidad de Buenos Aires
institution_str	I-28
repository_str	R-134
collection	Biblioteca Digital - Facultad de Ciencias Exactas y Naturales (UBA)
topic	Semistructured data Structural summaries XML XPath Cardinalities Complex mapping Increased flexibility Industry standards Open content Podcasting Regular expressions Schemas Semi structured data Structural summaries Structural summary WEB application Web collections Web document Web document collection XPath queries Markup languages Rough set theory World Wide Web XML
spellingShingle	Semistructured data Structural summaries XML XPath Cardinalities Complex mapping Increased flexibility Industry standards Open content Podcasting Regular expressions Schemas Semi structured data Structural summaries Structural summary WEB application Web collections Web document Web document collection XPath queries Markup languages Rough set theory World Wide Web XML Vaisman, Alejandro Ariel Exploring XML Web collections with DescribeX
topic_facet	Semistructured data Structural summaries XML XPath Cardinalities Complex mapping Increased flexibility Industry standards Open content Podcasting Regular expressions Schemas Semi structured data Structural summaries Structural summary WEB application Web collections Web document Web document collection XPath queries Markup languages Rough set theory World Wide Web XML
description	As Web applications mature and evolve, the nature of the semistructured data that drives these applications also changes. An important trend is the need for increased flexibility in the structure of Web documents. Hence, applications cannot rely solely on schemas to provide the complex knowledge neededtovisualize, use, query and manage documents. Even when XMLWeb documents are valid with regard to a schema, the actual structure of such documents may exhibit significant variations across collections for several reasons: the schema may be very lax (e.g., RSS feeds), the schema may be large and different subsets of it may be used in different documents (e.g., industry standards like UBL), or open content models may allow arbitrary schemas to be mixed (e.g., RSS extensions like those used for podcasting). For these reasons, many applications that incorporate XPath queries to process a large Web document collection require an understanding of the actual structure present in the collection, and not just the schema. To support modern Web applications, we introduce DescribeX, a powerful framework that is capable of describing complex XML summaries of Web collections. DescribeX supports the construction of heterogenous summaries that can be declaratively defined and refined by means of axis path regular expression (AxPREs). AxPREs provide the flexibility necessary for declaratively defining complex mappings between instance nodes (in the documents) and summary nodes. These mappings are capable of expressing order and cardinality, among other properties, which can significantly help in the understanding of the structure of large collections of XML documents and enhance the performance of Web applications over these collections. DescribeX captures most summary proposals in the literature by providing (for the first time) a common declarative definition for them. Experimental results demonstrate the scalability of DescribeX summary operations (summary creation, as well as refinement and stabilization, two key enablers for tailoring summaries) on multi-gigabyte Web collections. © 2010 ACM.
author	Vaisman, Alejandro Ariel
author_facet	Vaisman, Alejandro Ariel
author_sort	Vaisman, Alejandro Ariel
title	Exploring XML Web collections with DescribeX
title_short	Exploring XML Web collections with DescribeX
title_full	Exploring XML Web collections with DescribeX
title_fullStr	Exploring XML Web collections with DescribeX
title_full_unstemmed	Exploring XML Web collections with DescribeX
title_sort	exploring xml web collections with describex
publishDate	2010
url	https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_15591131_v4_n3_p_Consens http://hdl.handle.net/20.500.12110/paper_15591131_v4_n3_p_Consens
work_keys_str_mv	AT vaismanalejandroariel exploringxmlwebcollectionswithdescribex
_version_	1768544566779052032

Exploring XML Web collections with DescribeX

Ejemplares similares