Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Algebra equivalence or superset #79

Open
kjetilk opened this issue Mar 9, 2016 · 11 comments
Open

Algebra equivalence or superset #79

kjetilk opened this issue Mar 9, 2016 · 11 comments
Assignees

Comments

@kjetilk
Copy link
Contributor

kjetilk commented Mar 9, 2016

@RubenVerborgh and I had a discussion today about different abstraction levels, and their consequences further down the stack, and I came to the conclusion that it would be interesting to support something motivated from the following.

Say that you had some kind of a service that could efficiently answer things like

?s dct:identifier "965363" ;
    foaf:name ?name .

or

?s dct:title ?title .
FILTER regex(str(?title), "^Dahuts")

The main point being that there is a clear pattern that can be detected and that a plan can be generated for just this pattern. Of course, we can do that now, but I was thinking of ways to do it in less code and more concisely.

My idea was then that even though query containment is complex, it is likely to be used in very constrained ways, so a simple method to check if two patterns are the same, so that I could do stuff in plans_for_algebra like

my $bgp = Attean::Algebra::BGP->new(triples => [triplepattern(variable('s') ... , ...), ...]);
if ($algebra->isa('Attean::Algebra::BGP') && $algebra->equiv($bgp)) {
  push(@plans, AtteanX::Plan::FooBar->new(...));
}

Does this sound like a good idea?

@kasei
Copy link
Owner

kasei commented Mar 9, 2016

Yeah, that sounds good. The one challenge to making this really useful in planning is that you won't always get the algebra you want, even if it's equivalent to what you want. Expanding your second example a bit:

?s a ?type ; dct:title ?title .
FILTER regex(str(?title), "^Dahuts")

Let's say your system can handle just the dct:title triple and the FILTER, but not the rdf:type triple. Your planner is going to get a 2-triple BGP wrapped in a filter instead of the equivalent plan that you might prefer, a 1-triple BGP wrapped in a filter joined with another 1-triple BGP.

This is a general problem and one that's hard to solve. There are just so many equivalent rewritings of a query. I'm not sure if we should be looking towards letting extension code try to express a preference for what patterns they want and try to rewrite the query to support that, or if the extension might pick apart the algebra, handle what it can, and hand back the remaining partial algebra for re-planning.

Doing the simple stuff should be doable, though. I'll think about it and try to hack something together.

@kjetilk
Copy link
Contributor Author

kjetilk commented Mar 10, 2016

Right!

So, wrapping it in curly brackets, like:

{ 
  ?s a ?type . 
  { 
    ?s dct:title ?title .
    FILTER regex(str(?title), "^Dahuts")
  }
}

might that help? It would require the query author to understand what backends would be available, though.

@kasei
Copy link
Owner

kasei commented Mar 10, 2016

Yeah, that would probably work, but not in all cases, and is exactly the sort of thing I want to avoid (requiring users to understand the implementation choices).

@kasei
Copy link
Owner

kasei commented Mar 10, 2016

How "equal" do you want this equality test to be? Identical structures? Canonical term values are equal? Equal after canonical variable naming and triple pattern ordering?

@kjetilk
Copy link
Contributor Author

kjetilk commented Mar 10, 2016

It sounds to me that canonical variable naming and triple pattern ordering is the easiest to implement, so we can go for that, I think…

@kasei
Copy link
Owner

kasei commented Mar 10, 2016

Well... the easiest would be identical structures (triples in the same order, variables named the same). That probably isn't super useful, though...

@kjetilk
Copy link
Contributor Author

kjetilk commented Mar 10, 2016

Ah, right! No, using the same variable names wouldn't be terribly useful ;-)

@kjetilk
Copy link
Contributor Author

kjetilk commented Jul 25, 2017

Just a little loudthinking around this subject:

Since the main problem with the curly brackets solution is that the query author has to understand what the query engine can do, I was thinking about heuristics that might often help on the engine. Then, I thought about some heuristics, say that we had a dictionary that recorded parts of the query that could be answered efficiently, like for example, for

    ?s dct:title ?title .
    FILTER regex(str(?title), "^Dahuts")

we would record

    ?s dct:title ?title .

and

    FILTER regex(str(?title), "^Dahuts")

separately in the dictionary. Then, as early as possible, for example at parse time, the parser would look up in the dictionary and see if any other part of the query matches the rest of the pattern, and then construct the algebra appropriately, to make it easier for the planner to construct the wanted query plan.

Might that work?

@RubenVerborgh
Copy link

Makes me think of the Bucket Algorithm by Levy et al., where you also consider other constraints instead of only relations.

@kasei kasei assigned kasei and unassigned kasei Jan 9, 2019
@kjetilk
Copy link
Contributor Author

kjetilk commented May 13, 2019

We've started to talk about shapes now, and I'm wondering if using shapes to group patterns in such a way that this becomes feasible is an option... Like, the shapes could be used to choose certain algebras over others?

@kasei
Copy link
Owner

kasei commented May 13, 2019

@kjetilk I think shapes probably have a use here. Also possibly relevant is the framing work that's been going on with JSON-LD. I'm not very swapped in on that, but I wonder if there might be applications of framing on query patterns. That could allow a general solution for the problem of planners expressing how they expect/prefer the query patterns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants