Registered Member
|
I think we'd want to keep the ability to annotate any file format (I have some DjVu files for example), but an annotated PDF should also be able to be exported to share with others. |
Registered Member
|
As TheBlackCat said, yes. The akonadi collection would be a representation of a bibliography file such as a Bibtex or BibLatex file (the code that is responsible of keeping the collection and the file in sync is called the backend, and there can be backends for different file formats). In my opinion, the best way to go with PDF files would be that the bibliography manager would query Nepomuk for PDF files with metadata corresponding to the citation. This way, the files would not be a part of the collection as such.
Poppler (which Okular uses) does not support annotations currently, so exporting them is not possible at the moment. Adding notes to the reference is no problem if the backend has support for comments (for example Biblatex has an "annotation" field). If the file format doesn't support comments, they can probably be included to the collection somehow but cannot be exported.
Kbibtex for KDE4 is actually not as tied to Bibtex as the name suggests. The import filters pass the data to the program as a generic dictionary (key-value pairs) which can in principle contain any keys. The UI is propably designed for Bibtex though, but as most bibliographies have a similar format I think it could be adapted for other backends, too. At least the import and export filters (which are separated from the rest of the code) could definitely be reused. |
Registered Member
|
From what I have heard the git version of poppler, or maybe even a release version by this point, supports annotations. See here here. Okular, though, does not support this feature currently (although evince does).
Man is the lowest-cost, 150-pound, nonlinear, all-purpose computer system which can be mass-produced by unskilled labor.
-NASA in 1965 |
Registered Member
|
I have evaluated TheBlackCat's ideas a bit and give my comments here.
Zotero looks for a DOI string in the PDF files (using poppler) and uses dx.doi.org to get the publisher's page. If a DOI string is not found, it takes some random string from the article and tries to find the publisher's page with Google Scholar. It doesn't really extract the data from the articles but from the web pages. We could do the same. Strigi indexes the full text content of PDF files and searching for DOI strings would be fast. The bibliography manager application could have a tool "Find citations from all PDF files on your disk".
It is not realistic to have strigi do this, but strigi indexes the full text content of files and that is useful (see above).
Using Akonadi would be a good idea. Bibliography files would be collections and the entries would be the items of the collection. Different backends (Bibtex, Biblatex, Tellico XML etc.) could use the same class representation for a bibliography entry, allowing the bibliography manager to support different formats natively. I don't believe it'd be a good idea to try to incorporate the PDF full text files to the collection, but instead keep them somewhere on the disk and fetch them with Nepomuk on demand.
Akonadi cannot be used for online indexes because an Akonadi backend must provide a retrieveItems() function that gives a list of all items in the collections. With PubMed for example that is not possible.
Yes. And as pointed out, the annotation capabilities of Okular could be used.
Good idea. I think this could consist of two parts. (1) A kind of library for retrieving citation info from a given URL (with a plugin system for specialized extractors for each publisher, like Zotero). (2) A KPart::Plugin for Konqueror that uses the library to look for interesting pages. The library would not only be used by the plugin but also by the tool that scans for DOI strings from PDF files and looks for citation info from the publisher's page with dx.doi.org.
Well, I'm not sure how useful a KIO slave would be. Perhaps it could be an alternate way to search and browse your bibliographies.
I'm not sure what you mean with "integrate papers into office documents"... Do you mean add citations to a paper you are writing? That'd be useful, I suppose. Although I prefer using Latex .
Right.
I'm not sure what you mean. Do you mean the layout of how the bibliography appears in the bibliography viewer or how your citations appear in the end of you office documents? |
Registered Member
|
It seems you are correct, but as you said both approaches would be possible.
I would think this could be done automatically. If strigi finds a DOI string when scanning PDFs it would automatically attempt to retrieve the data for it and add it to the document's metadata. I suppose the button would still be needed for the google scholar phrase search.
Which wouldn't be realistic? Using the layout, or recording the citations in the paper? I see how using the layout might not be feasible, but keeping track of citations is essential (and one of the big benefits of nepomuk is that it keeps track of relationships between data, so I would think this would be natural for it).
I agree.
Why couldn't it be told to, for instance, retrieve the items 0 through 100 of a particular search? I would think this sort of thing would be essential, for instance for handling rss feeds (you aren't going to retrieve every single blog post from a blog with thousands or even tens of thousands of posts).
Yes, that was my intention.
Yes, that is what I mean. A lot of people don't know latex, and to them having all these citations would be next to useless if there was no easy way to add the references to their papers. This is an essential feature.
The layout of the bibliography in the paper. Journals have their own bibliography and inline reference formats they require. This would have a database of reference formats (probably using GHNS for easy sharing), as well as a WYSIWYG interface for designing new ones from scratch. These could then be directly incorporated into the paper or be used by the latex engine.
Man is the lowest-cost, 150-pound, nonlinear, all-purpose computer system which can be mass-produced by unskilled labor.
-NASA in 1965 |
Registered Member
|
The thing is that indexing complicated files such as PDFs is normally done with "endanalyzers", and you can have only one of those per file at a time. By not realistic I meant that modifying the built-in pdf analyzer is not feasible. Admittedly one could add a second analyzer if it's a "streamthroughanalyzer" but then the whole file should be loaded into memory. A limit for the file size would be then needed and that's not optimal. Publishers usually seem to have a list of citations in the paper in the article's website so I'd prefer extracting them from there. Even if we don't modify strigi we can have the bibliography application query nepomuk for new PDFs on the disk and have it automatically fetch metadata for them.
That's just not how Akonadi is designed. It's for personal information management, and that normally involves finite size collections.
Ok. I don't know much about WYSIWYG editors. |
Registered Member
|
Sorry by bumping a old topic, but how is this project going? Is someone working in what was discussed here?
A bib management system as described here would be awesome. I would love to see this come to life. |
Registered Member
|
As far as I am aware no one is working on this or is planning on working on this, unfortunately.
Man is the lowest-cost, 150-pound, nonlinear, all-purpose computer system which can be mass-produced by unskilled labor.
-NASA in 1965 |
Registered Member
|
Actually, this is not entirely true . I did write some code a couple of months ago and managed to demonstrated some of the concept we were discussing. Unfortunately, I got then more busy and abadoned the project for the time being. My plan was to come up with something that is ready to be used in one way or another to attract other people to join with the effort, but I ended up`leaving the code to a somewhat messy state. Let's see what my code does: Akonadi resource and serializer Akonadi resources convert data between file backends (e.g. a bibtex or biblatex file) and a class representation that can be used in applications. I had to decide what this class representation would be and ended up using Bibliographic Ontology, because I wanted to stay as independent from a specific file format (such as biblatex) as possible. So the resource offers the data basically as RDF triples using the Bibo properties. There is no perfect 1-1 correspondence between Bibo and Bibtex or Biblatex, so there will be limitations in Biblatex support, but I think this approach is better than using one file format as the native data format... At the present state the Bibtex/Biblatex backend is more or less working, but support for more fields should be added, conversion from Bibtex syntax to UTF should be improved and there are bugs. Also, so far the resource doesn't react to changes in the backend. Konqueror Plugin and citation extractor In the present state, the Konqueror plugin is able to extract citatation metadata from Science, Nature, ACS, APS, AIP, Wiley, Springer and Elsevier journals and save it to Akonadi. The plugin itself is very simple and most of the code is in a library called WebExtractor, which uses WebKit and python plugins to parse the publication websites. Each publisher has its own plugin. Compared to Zotero translator plugins, which can be very confusing, my python scripts are very simple and straightforward and quick to write. The library is independent from the browser so it could in principle be added as a plugin to any browser. The plugin still doesn't save the actual PDF file. The saving itself would be simple to implement but the PDF would not be useful without attached Nepomuk metadata which would make it easy to connect a citation and the corresponding PDF file. Which brings me to the main problem: there is no bibliographic ontology in Nepomuk. It is possible to install Bibo to Nepomuk and use it but it would not be a long term solution. The ontology is used to represent the data internally so if Nepomuk later gets something like Bibo and we'd want to switch to it, all the code would have to be modified. The right thing to do would be to contact the Nepomuk team and propose a bibliographic ontology, but that wouldn't make sense right now because the development of my project is stalled. GUI for managing the citations I have some ideas for it but I never actually planned to write one because it'd be too much work. I thought maybe someone else would do it. However, I wrote a widget to view the entries of a bibliography. Maybe when Qt5 comes with the capability to use QML for desktop apps the effort of writing a GUI becomes smaller. Final words I managed to keep the number of lines of code rather small, so this could be a nice starting point to anyone who wants to try his/her hand at making a bibligraphy management system. Though, there are no comments in the code and things could use some cleaning up... If someone manages to make some progress with it, I would most likely join to help . I don't have an online source repository but the code is available from me by asking... |
Registered Member
|
There are news! It seems that Joerg is implementing something similar to what is described in this topic. Take a look at http://joerg-weblog.blogspot.com/2012/0 ... ow-me.html
Joan |
Registered Member
|
|
Registered Member
|
More news where Jörg meets Tuukka: http://joerg-weblog.blogspot.com/2012/0 ... ction.html
Good work guys! |
Registered Member
|
I think BibLaTeX supports unicode.
|
Registered users: Bing [Bot], Google [Bot], kesang, Sogou [Bot], Yahoo [Bot]