Categories


Contact

Search

Links


Archive


Open Repository Blog

Thursday Mar 20, 2008

File-type analysis tool

Yesterday, another new feature was added to the Open Repository service. The file-type analyzer is linked to from the admin menu, and allows administrators to see a breakdown of the content types within their repository of the numbers of items and the file types they contain.

The main page lists the various file types within the repository, the number of items containing each file type, and how many of each file type in total (e.g. 10 items with Adobe PDF and 12 PDF files in total)

The details can be drilled down in to. Therefore, you can view each of the 10 items, and see exactly how many of each file type an item contains.

The display also shows the reverse: i.e. which items don't contain a certain file type but which files these items do contain.

The total number of metadata only items is also displayed, which again can be expanded to view each of these items.

The total number of items with files + total number of metadata only files = total number of items in the repository.

This tool is available for all production and pilot repositories.

 

We've also added RefWorks as one of the citation management download options, alongside EndNote and ReferenceManager.

 

 

Thursday Feb 21, 2008

New item move tool released for Open Repository

It's been a little quiet on the blog this month, if not in the office. We're proud to announce that Medecins Sans FrontieresNorthumbria University and Helsebibliotekets are all moving towards their full production releases, and also to welcome the Museum of London as our newest customer.

As always we've been fixing and tweaking where we can, improving speed and efficiency as we go, especially in relation to the browse pages. Meanwhile  DSpace 1.5, the majority of which we implemented last year, has gone into Beta testing for release in March.

We've also just released a new tool, a late addition to DSpace 1.5, that will help administrators with requests to move content around the repository. Whilst a tool to drag and drop communities and collections is still some way off, items can now be moved from one collection into another. Furthermore mapped items can also be moved to a new mapping location.

Therefore, if you want to move the contents of one collection into another, or into a couple of collections, this can now be done, albeit one item at a time. This means that with a little careful planning, the creation of new collections, moving of items and deletion of old collections, essentially what we have is a tool to enable a moveable hierarchy!

The item move is carried out through the Edit item page, and full instructions have been sent to senior administrators and will be added to the next version of the admin manual, due at the end of this month.


 

 

Friday Feb 01, 2008

January Update

With the arctic winds blowing the threat of snow down to London, it's time to catch up with what's happened to OR during January before we're all potentially stranded by a few flakes bringing the capital's transport system to a shivering halt.

It's been a busy month; we hit the ground running and haven't stopped, with improvements to both the front and back end of the service.

Hopefully, you'll have noticed a marked improvement in speed when using your repositories. We've focused on further improving many of the processes that drive the service, cutting down on memory usage and estimated that this work boosted performance by about 25%. Yesterday, we moved to new web servers with increased memory and processing power, which will not only improve efficiency even further, but also ensures that, in the long term, as your repositories grow and usage increases, there will be no loss in performance.

  • At the front end, we've mainly focused on three areas: the in-submission document conversion, EndNote and RefMan imports, and search, with the following enhancements released:
  • The submission form now recognises OpenOffice document formats (ODF) when added as bitstreams to a submission.
  • The document conversion tool will now convert Microsoft Office applications (Word, Excel, PowerPoint) to their OpenOffice counterparts, as well as to PDF.
  • The search results can now be ordered in the same manner as the browse results offering:
    • ordering by relevance, title, issue date or submit date
    • setting the number of results displayed per page
    • ascending or descending order
    • limiting the number of authors displayed with each result
  • The advanced search now has a date restriction field, which will display only a range of dates that exist within the repository
  • Full text items in non Latin alphabets are now searchable. This means that if you have a document written in Arabic, if you search in Arabic and the document matches that search, the item will be returned in the search results.
  • The tool to download item metadata to either EndNote or RefMan remains available to admins only for the moment, as does the additional search enhancement to download a page of search results at a time to EN/RM. It'll be released as soon as we've completed further testing.
  • We've also been working on the reverse, the importing of EndNote and RefMan library files, also still in testing.
Next month we're looking forward to further progress with LDAP authentication, and the release of a nice little tool that will tell you which file types are in the repository, how many of them and which items they, are amongst other things. As ever, I'll be posting more details in a few days. Have a great weekend.

 

 

Thursday Jan 17, 2008

Our monthly task list explained

Halfway through January, a good time to cast a more general eye over what we're doing at the moment and perhaps shed some light on what it is we get up to in the daylight hours.

Each month we create a task list, broken down into roughly 4 areas: new repositories, major projects, small projects, and individual customer change requests. Not every area will be represented each month, and occasionally we'll create a new category for something that doesn't quite fit the bill.

New repositories will be a combination of new trial or production repositories. This month we've added the HSE pilot, and the Royal College of Nursing pilot should be ready in a couple of days.

Major and small projects cover anything that changes a feature or function that will be applied to the entire code base. Individual customer change requests are front-end, organisational, form, or text changes made for a specific repository. These changes can all be made using repository specific files that don't require changes to the shared code, and won't impact other repositories. For example: we get a request from anonymous customer A to add a new field to their submission form, change the word 'community' throughout to 'directory', remove a box from the homepage, and update their URL. These all fall into the individual customer change request list. Anonymous customer B later asks us to develop a solution to download content to an iPod. This will require changes to the code, and so, after analysis, some head-scratching, occasionally head-shaking, and prioritisation eventually becomes either a small or major project, depending on the work involved. Please don't ask us about the iPod idea though.

Small projects usually take no more than a couple of weeks to complete, and tend to be confined to a single piece of functionality. Major projects are longer term changes, often with architectural implications and can take months to complete. Last year's upgrade was a major project. Within that were many small projects, for example, adding the list of latest submissions to the homepage. I could complicate matters further by saying some major projects can be rather ephemeral, such as: make search better, but I won't; as these find themselves being broken into a number of small projects, such as 'search within a specified date range', that are related, but won't necessarily be worked on at the same time.

The process of arriving at the monthly task list involves a great deal of careful consideration, analysis, negotiation and a lot of wild gesticulating. We maintain a database of all customer requests, ideas and suggestions, which are a mixture of small change requests and project ideas. Each month we'll endeavour to complete as many of the small change requests as possible. However, in the event of there being too many, we need to discuss the urgency with each customer and prioritise accordingly. Project ideas fall into a separate melting pot, also containing ideas of our own, and work being carried out within the DSpace community. Deciding which tasks get pulled out and worked on depends on various factors: the popularity of the request, urgency of the request, usefulness of the request to the widest customer base, complexity of the task, how the work fits in with the overall schedule, whether or not the work's being looked at elsewhere, who's available to work on it and possibly whether or not we're in the mood that day.

This month, other than the new pilots already mentioned, we're focusing on small change requests for Landspitali, Exeter, Wolverhampton and Medecins Sans Frontieres. The rest of our tasks are all small projects. I've spoken previously about the work we're doing on search: ordering results, and restricting to date ranges. We're also going to look into full text searching of foreign characters and restricting searches to items with full text. We're currently testing the export to EndNote button, and have started working on how to batch import EndNote files and libraries. Thanks to those who have sent test data through for us to play with.

I hope that running each monthly task list up on the blog, will give a clearer idea of what we're working on and what to expect. I'll supplement this, from time to time, with updates on our progress over longer term goals.

 

 

Wednesday Jan 16, 2008

Search? Sorted.

Open Repository search ordering

 

 

 

 

 

 

 

As of only a few minutes ago, it is now possible to sort your search results on Open Repository in the same manner as you can with the browse options. That is to say, by Relevance, Title, by Submission Date and by Issue Date. Results can be ordered in an ascending or descending list, with options for the numbers of results per page and number of authors displayed against each result. The default setting for the initial results list will always be by Descending Relevance, in other words, with the most relevant results at the top.

Where community or collection names are found in response to a search query, they will still be displayed on the first page of search results, but are not affected by ordering the results.

We will now start work on the interface for restricting a search to within a specified date range.

 

 

Wednesday Oct 10, 2007

Choosing custom options

 

We've almost completed the conversion of the old site interfaces to the new version and everything is looking good. There hasn't been much feedback yet so I'm hoping that means everyone is happy with the new look and feel. It shouldn't be long now before we're able to get test instances of each of our repositories up for people to look at individually before we move to live.

Our next step is to set up the custom options for each of our current live customers. Apologies but these customisations are not available to pilot customers. The main choices that need to be made are for which fields appear on the submission form, which fields are used for the advanced search indexes and which fields are used for the browse menus.

We've set up a short online form for you to make these choices. Let me know if you have any questions or problems with the form.

 

 

Wednesday Sep 26, 2007

Google Analytics

Something I've yet to mention is that we're adding Google Analytics to all the live Open Repository sites with the 1.4.1 release. GA has recently relaunched with a new interface and we have to say that we're really impressed with what it does and what it can do.

You can (just) see from the screenshot of the main Visitors overview below that GA offers a host of welcome reporting tools.

 Google Analytics Overview Screenshot

GA offers three main sections:

  • Visitors - where your visitors are coming from, what they're doing and what they're using.
  • Traffic Sources -  where your traffic is coming from and which search engines are referring them.
  • Content -  which pages are getting the most hits, how people move through the site and how long they stay there.

At each stage you can drill down through the layers, create different reporting views (maps, graphs etc), change the date range you're analysing and export the results into a variety of different formats (PDF, XML, CSV, TSV). You can even customize your main 'dashboard' page to display the key reporting areas you require.

We're still really enjoying playing with GA, and without wishing to descend into marketing cliche, we think you will too. It's incredibly intuitive, very flexible and covers many of the requests made to improve the current DStat statistics reporting package (which will remain part of the Open Repository service for the moment). The DSpace community meanwhile is discussing whether or not Google Analytics could eventually replace DStat in a future release.

 

 

Monday Sep 24, 2007

Customer surveys and what happens next; a post in two parts - part 2

So here's part 2 of this post I began on Wednesday. This round I wanted to take a look at what we have planned next for Open Repository.  We've taken all the feedback regarding potential new features gathered through the user surveys and combined it with a list of all the requests and suggestions that have come through from the customers. Without adding any weighting to each suggestion, the combined list (randomly ordered) looks something like this:

  • re-arrange the hierarchy (e.g. move content between collections or collections between communities).
  • add date limiters to advanced search.
  • export metadata into citation management software such as EndNote, RefMan etc.
  • import citation management software files (EndNote, RefMan etc) into repository, pre-filling metadata fields on import.
  • display links to local sites or resources on home page.
  • display most downloaded articles on homepage.
  • display most recent additions on home page.
  • improved action validation and messaging on admin site.
  • import content from local databases.
  • display list of communities and collections in any desired order.
  • display random or chosen highlighted article of the month.
  • add MeSH terms to submission form fields and to the PubMed pre-fill and OA datafeeds.
  • switch between different language displays of the interface.
  • distinguish internal authors from others.
  • add thesauri or controlled vocabulary for keywords on submission form.
  • add RAE API for UK repositories
  • allow users to create submission rights at registration.
  • create 'dark archives' or hidden collections.
  • add Journal to submission form fields and to the PubMed pre-fill and OA datafeeds.
  • choose which columns are displayed for search / browse results
  • item embargo periods
  • create links between metadata fields (e.g. all author names would link to the author browse results).
  • restrict registration to internal members.
  • integrate the SHERPA Romeo API to the submission form.
  • improve messaging on the submission form.
  • allow local blog feeds to be displayed on the home page.
  • allow submission to multiple collections at same time.
  • more defined statistics reporting, especially at community and collection levels.
  • edit items in the same style as the submission form.
  • allow authors and / or submitters to edit their own items.
  • order items within the workspace by date submitted.
  • automatically populate researcher pages when new content is added.
  • enable auto-login.
  • allow browser back buttons to be used in the admin system.
  • allow collection editors to edit items.
  • check for duplicated content during submission.
  • allow admins to add content to researcher pages.
  • enable browsing of researcher pages
  • make the fields displayed in researcher pages customizable.
  • make the submission form fields customizable.
  • create customized page layouts.
  • allow choice of  browse menus to be displayed.
  • allow choice of metadata fields searched on.
  • enable lists of institutional users to be uploaded directly into the database.
  • create additional pre-fill options (e.g. PubMed Central or arXiv).
  • automatically extract metadata from uploaded documents to pre-fill submission form.
  • enable handling of E-theses.
  • display download statistics against each item.
  • add additional email alert / RSS options (subject, author, journal etc)
  • add a 'request a researcher page' button
  • render (display) XML files as HTML
  • version control for content
  • LDAP authentication

That's a tough list by any standards, especially with so many good ideas up there. The good news is that those items in blue will be included in the 1.4.1 release.

In order to choose which items would go on to the development list for the remainder of 2007 and into the early stages of 2008 we took a number of factors into consideration.  We took the most requested features from the surveys and the most requested features from the feature requests that have been sent in to form a list of the 9 most desirable things to work on:

  • exporting metadata to and importing metadata from citation management software (EndNote, RefMan etc)
  • item download statistics
  • item embargoes
  • limit advanced search by date
  • link metadata to browse menus
  • LDAP authentication
  • moving the hierarchy
  • customizable interface
  • saved searches

That list was then whittled down to 4 items that could be completed within the next three to four months. In editing this list further we looked at: what work is being done in DSpace (so as not to duplicate effort), what could usefully be contributed back to the DSpace code, what features other repository software solutions have but we don't, and what features could be completed without tying up all our development resources for months on end as well as what best fitted the service overall benefiting as many customers as possible.

And the winners are:

  • exporting metadata to and importing metadata from citation management software (EndNote, RefMan etc)
  • LDAP authentication
  • item embargoes
  • item download statistics

I hope there won't be too much dissension with our decision. Once we're clear of 1.4.1 and gathered the feedback from the release we'll resend this list and ask for further comment so we can plan a more comprehensive roadmap for 2008 to fit around the release of DSpace's customizable interface project, Manakin.

 

 

 

Wednesday Sep 19, 2007

One bug and two fixes to go

That's right readers, we're down to our last bug - the document conversion tool has unconfigured itself - and two final small changes to the interface.  We've been terminating our bugs with extreme prejudice with five coming off the list yesterday and expect these final fixes to be finished by the end of today.  All being well we should be in a position to declare the release 'stable' tomorrow.

Friday we'll all be out of the office on a company training away day and so the public test site won't be available until next week now.  There will of course be an announcement as soon as it is.