Reply to topic

Proposal for a bibliographic system

User avatar RGB
Registered Member
Posts
341
Karma
0
OS

Proposal for a bibliographic system

Tue Sep 14, 2010 8:39 pm
No idea if this is feasible or not, but here it is my idea (a by product of a night of insomnia... ;) )
All scientific articles (at least, that's true on physics) on all magazines I know have the same structure, changing only on formatting details: a heading with the article title, the list of authors, the abstract and the body text (usually on two columns). Headers and footers can also contain the magazine name and issue, page number, etcetera.
Due to the fact that this scheme is so standard, it could be possible to use some reg-ex magic to automatically obtain lots of data about the article.
So here starts the idea. Or two ideas:
1- Suppose you opened a pdf with the article on Okular. A button/menu called "send to bibliographic database" start the reg-ex magic, after that a dialog asking for the missing info is started, a bibliographic database with all those bits is feeded and finally a copy of the pdf is sent to a proper folder, classifying the articles according to the collected data (by field of study / author / ...)
2- By dragging the pdf containing the article to a plasmoid similar to "magic folder", the reg-ex magic start, you are asked for the missing info, the database collects everything and the file goes to the appropriate folder.
What do you think?


RGB, proud to be a member of KDE forums since 2008-Nov.
And proud to be a kde user since 1.1.2
User avatar TheBlackCat
Registered Member
Posts
2945
Karma
8
OS
As I already pointed out in the other thread, strigi should be able to automatically detect and extract bibliographic information from every journal article on your hard drive. So all a user should have to do is save the article anywhere they want, strigi should then be able to detect it, extract the information, then store it to the database for easy retrieval later. So the okular integration is unnecessary.

As for extracting the information from the files, zotero can already do this.

Since this is a dedicated topic, I will cross-post my previous proposal:

For a paper manager, this is how I envision it:

* Strigi is used for searching for papers already on the hard drive, extracting citations from papers based on layour and/or DOI, and for full-text indexing the document's content. It would also scan the paper's own references in order to keep track of connections between papers and make it easier to search for related papers.
* Akonadi is used to store and retrieve papers. Authors and journals are similar to contacts and papers are similar to emails, so the change necessary to implement this should be minimal. This also allows easy development of alternative front-ends and sharing papers over a network or storing them on a remote computer.
* Akanadi is also used for retrieving citations from online indexes, like pubmed, both on-demand and automatically checking for new articles that fit certain criteria.
* Okular kpart is used for displaying papers
* Konqueror/rekonq has integrated system similar to Zotero for retrieving papers and citations from web sites and storing them on the local drive
* A kio slave is available to find and work with citations. This would ideally seamlessly integrate local searching with searching on article databases, and would make use of the strigi search interfaces.
* A plugin in koffice is used to format and integrate papers into office documents (a similar plugin could be used for openoffice or even MS word). The documents in a paper would be recorded by the akonadi backend, so you could easily retrieve a list of articles from a particular document you wrote.
* A program similar to kmail is used to find, organize, and display papers. It would include saved searches, an okular part to view articles, and an advanced search interface similar to the dolphin Facets search.
* A bibliography layout designer, accessible from both koffice and the main program, with GHNS integration for easily sharing layouts.


Man is the lowest-cost, 150-pound, nonlinear, all-purpose computer system which can be mass-produced by unskilled labor.
-NASA in 1965
The User
KDE Developer
Posts
647
Karma
0
OS
Not only KOffice should be supported, but also BibTeX/BibLaTeX with integration into Kile.

KBibTex supports fetching from Google Scholar and Bibsonomy. I to not have any experiences with these services, just wanted to state that.

Not only PDFs should be referenceable but every web-ressource, sometimes there are HTML-pages you want to refer to, or there is a .ps-file or an odf-presentation…
lecoeus
Registered Member
Posts
5
Karma
0
OS
This is the single most important feature needed for the science suite of KDE because everybody suffers from this reference management problem. I think the best solution would be to port Kbibtex to QT4 and add features to it. There was a Kpapers idea in the kde-apps discussion that I like. Kbibtex basically needs, as far as I know, extracting data from pdfs, pushing citations to Kile and a fancy layout with a okular kpart. If the library management can be made good enough, one could just browse and open articles directly from that interface rather than with a file browser.

Another thing here is that for certain advanced features such as attaching commentary to articles, a bibtex file is not suitable. The storage at this point should be a database for Kbibtex with the option to export to bibtex whenever desired. Outside that program, a kio slave for pdf files and a zotero-like plugin to konqueror or rekonq would also be nice.

But I don't really see the benefit of using akonadi or strigi. For one thing, not many people, including me, have strigi enabled and Kbibtex would be enough for the job if it ganed the required functionality. After all, you would not need all the references cited in every paper on the disk - that would mean thousands of citation for me. Akonadi may also have performance issues. It takes enough time to start up without the addition of data about hundreds of articles and there is nothing it would bring to the table. The alternative frontends can always use the database Kbibtex or another program generates. Also, Linux in general lacks a decent citation manager that is not as hideous looking as jabref and a KDE program would also be interesting to users of other environments which do not have Akonadi or strigi.
User avatar TheBlackCat
Registered Member
Posts
2945
Karma
8
OS
I think at the very least full-text searching of articles would be an essential feature, and that requires strigi anyway. Akonadi and nepomuk are not KDE-only, there is a gnome front-end for akonadi in the works as well (and akonadi depends on nepomuk). Further, akonadi is already designed to work with huge databases of objects linked to a large numbers of files stored on disk (you may not need to keep track of many papers, but lots of people do).

But the main advantage is that we get most of what we need for free. We already mostly have the UI, data handling, database, linking database entries to files, network support, searching, indexing, categorization, linking related resources, sharing, display of files, listing, synchronization between computers, backup, and web retrieval. Most if not all that stuff would need to be implemented from scratch for something like kbibtex. We would even get things like smartphone support mostly for free (putting your papers on a smartphone or tablet, searching for papers on the road or at a conference, annotating papers on-the-fly, and so on).

Much of the work going into developing and maintaining KDE PIM would also benefit this program, while kbibtex requires writing and maintaining a completely separate code base for most of the same features. Plus akonadi is explicitly designed for flexibility, scalability, portability, and easy use of alternative front-ends (like a gnome front-end).

In other words, the sort of changes you are suggesting be made to kbibtex would, as far as I can see, essentially turn it into a program pretty much identical to what I am describing, but would likely require rewriting a lot of existing functionality from scratch.


Man is the lowest-cost, 150-pound, nonlinear, all-purpose computer system which can be mass-produced by unskilled labor.
-NASA in 1965
The User
KDE Developer
Posts
647
Karma
0
OS
What should a PDF-KIO do?
User avatar RGB
Registered Member
Posts
341
Karma
0
OS
Maybe I'm a bit old fashioned, but I have a problem with fully automated solutions: the inability to decide when a paper must NOT be indexed.
I mean, not all the papers I find when searching about some topic is good enough to be preserved. In fact, LOTS of papers are not good enough...
And as lecoeus said, not all (included me) use strigi so a more "focalized" alternative is quite interesting. That's why I thought about okular integration.
Classify papers in "real time" when reading them, maybe adding some notes on them, is for me as important as (if not more important than) the indexing process because this helps me to understand what I'm reading, not only to build citation indexes.


RGB, proud to be a member of KDE forums since 2008-Nov.
And proud to be a kde user since 1.1.2
john_hudson
Registered Member
Posts
549
Karma
2
OS
You need to separate the elements of a bibliographic system:
1. database format - at the moment BibTeX is the most well-known but it was written for 7-bit ASCII and we really need a utf-8 version
2. a database management system or frontend; there are several programs for managing BibTeX files or you can take advantage of the code highlighting in Kate to manage BibTeX files
3. a system for extracting data and presenting it in a document; the LyX GUI or KILE for LaTeX do this if you are using the TeX engine and there is also a program to extract BibTeX data for use in Word/OpenOffice.

If you count LyX as a KDE program - since developing it was part of Matthias Ettrich's inspiration for KDE - we already have a comprehensive and well-supported bibliographic system.

If you want an introduction to LyX/Latex demonstrating it running under KDE4 and using Kate for BibTeX, go to http://bradlug.co.uk/?p=364


John Hudson, proud to be a member of KDE forums since 2008-Oct.
The User
KDE Developer
Posts
647
Karma
0
OS
I think BibLaTeX supports unicode.
john_hudson
Registered Member
Posts
549
Karma
2
OS
The User wrote:I think BibLaTeX supports unicode.


With Biber rather than BibTeX; you will run into problems using utf-8 with BibTeX. That incidentally is one of the great advantages of LyX; it allows you to enter utf-8 in the document which it then translates into an appropriate encoding before passing it to TeX for processing.


John Hudson, proud to be a member of KDE forums since 2008-Oct.
lecoeus
Registered Member
Posts
5
Karma
0
OS
Point taken about rewriting the existing code - perhaps some structures of KDE pim can be used for this. But I still think requiring a running strigi is a bad idea because once turned on, that thing tries to index all of the drive and at least for me it never stops. Also, I can see non-KDE people complaining about large dependencies if the reference management program pulls in akonadi and nepomuk with it.

I honestly don't know if using a database as storage would require much coding effort if done independently from akonadi. It would be definitely easier with akonadi, but there is a tradeoff. Carrying a database around with you would be easier when you need to use a different computer or when you want to send your database to a friend. I am not sure how easy it is to carry the akonadi database around but it would be harder in comparison. You have a valid point with the existing functionality, though and smartphone integration would be great.

I don't understand what you mean when you say the UI is ready. Do you mean the technical code interface or the GUI? No graphical layout I have seen in KDE pim programs come close to what many would expect in a bibliography program.

At any rate, I will probably end up using whatever gets created for kde4 because right now there are no open alternatives except kde3 programs and jabref.
The User
KDE Developer
Posts
647
Karma
0
OS
Well, but LyX also has some disadvantages. ;)

BibLaTeX is smply more flexible than BibTeX…
User avatar TheBlackCat
Registered Member
Posts
2945
Karma
8
OS
lecoeus wrote: But I still think requiring a running strigi is a bad idea because once turned on, that thing tries to index all of the drive and at least for me it never stops.

That is a bug in strigi, and there has been a lot of work in fixing these bugs with strigi lately. I expect the issues would be worked out before an alternative implementation could be written from scratch.

Whatever the case, thanks to the flexibility akonadi it would probably be possible to have multiple different approaches that could be enabled or disabled separately. Full-text searching of all parts of your hard drive would be a feature that could be enabled or disabled, just like individual online database searches could be enabled or disabled.

lecoeus wrote:I honestly don't know if using a database as storage would require much coding effort if done independently from akonadi

A database probably wouldn't, but a database that is fast for huge numbers of entries, supports full-text searching of articles, is supported on multiple platforms, and allows for multiple simultaneous reading and wriring would be considerably more difficult I expect.

lecoeus wrote:It would be definitely easier with akonadi, but there is a tradeoff. Carrying a database around with you would be easier when you need to use a different computer or when you want to send your database to a friend. I am not sure how easy it is to carry the akonadi database around but it would be harder in comparison.

It would be very easy, since someone already developed this for gsoc (see Here and Here). Once again, this is something that a PIM application needs as well, so we would get it for free.

lecoeus wrote:I don't understand what you mean when you say the UI is ready. Do you mean the technical code interface or the GUI? No graphical layout I have seen in KDE pim programs come close to what many would expect in a bibliography program.

I think kmail's interface is pretty close. In fact it is almost the same as the papers interface shown on the kpapers page. Papers has an additional panel for tags (which KDE provides), the toolbar is at the bottom rather than the top (which KDE supports), and there are some filters above the papers list (which the dolphin facets panel could provide a more powerful version of), but otherwise they are quite similar in my opinion.


Man is the lowest-cost, 150-pound, nonlinear, all-purpose computer system which can be mass-produced by unskilled labor.
-NASA in 1965
lecoeus
Registered Member
Posts
5
Karma
0
OS
I see. Meanwhile I compiled the kde4 port of Kbibtex from svn yesterday and it is already looking pretty good considering where it was a little time ago. Perhaps the developer will do something about Nepomuk integration or at least a database storage in the future.
john_hudson
Registered Member
Posts
549
Karma
2
OS
The User wrote:Well, but LyX also has some disadvantages. ;)

BibLaTeX is simply more flexible than BibTeX…


Which is why the LyX developers are working on moving to BibLaTeX.

Having used LyX in its various versions since 2000, the current group of developers have made dramatic improvements in its usability in the past four years, not least by re-writing virtually all the documentation to explain how to insert LaTeX code for those things which LyX does not yet, or may not ever be able to, support.


John Hudson, proud to be a member of KDE forums since 2008-Oct.

 
Reply to topic

Bookmarks



Who is online

Registered users: Baidu [Spider], Bing [Bot], exahamza, Google [Bot], jkurutz, Majestic-12 [Bot], razorrob, shevchuk, Sogou [Bot]